<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Nicholas Volkhin</title>
    <description>The latest articles on Forem by Nicholas Volkhin (@sbwerewolf).</description>
    <link>https://forem.com/sbwerewolf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875429%2F4b72e857-51ba-48a7-9f85-069e61293461.jpg</url>
      <title>Forem: Nicholas Volkhin</title>
      <link>https://forem.com/sbwerewolf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sbwerewolf"/>
    <language>en</language>
    <item>
      <title>When to Use XmlExtractKit Instead of General XML Tools in PHP</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Fri, 17 Apr 2026 19:50:21 +0000</pubDate>
      <link>https://forem.com/sbwerewolf/when-to-use-xmlextractkit-instead-of-general-xml-tools-in-php-344i</link>
      <guid>https://forem.com/sbwerewolf/when-to-use-xmlextractkit-instead-of-general-xml-tools-in-php-344i</guid>
      <description>&lt;p&gt;One of the easiest ways to make XML work painful in PHP is to start with the wrong question.&lt;/p&gt;

&lt;p&gt;A lot of developers ask:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“What is the best XML library for PHP?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That sounds reasonable, but it is usually the wrong framing.&lt;/p&gt;

&lt;p&gt;There is no single best XML tool for every job.&lt;/p&gt;

&lt;p&gt;The real question is narrower:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What kind of XML task am I solving?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That matters because XML work in PHP usually falls into very &lt;br&gt;
different categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;load a small document and read a few values;&lt;/li&gt;
&lt;li&gt;manipulate the full document tree;&lt;/li&gt;
&lt;li&gt;stream through a large file safely;&lt;/li&gt;
&lt;li&gt;extract repeated business records from large XML;&lt;/li&gt;
&lt;li&gt;validate or transform XML as XML.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not the same problem, so they should not lead to the same tool choice.&lt;/p&gt;

&lt;p&gt;This is the distinction I care about when I use or build XML tooling for PHP.&lt;/p&gt;

&lt;p&gt;My package, &lt;strong&gt;XmlExtractKit&lt;/strong&gt; (&lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;), is not trying to win every XML scenario. It is built for one narrower and very common class of work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;large XML → selected nodes → plain PHP arrays&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If that is your actual task, it can be a much better fit than a more general XML tool. If it is not your task, you should probably use something else.&lt;/p&gt;
&lt;h2&gt;
  
  
  First: what XmlExtractKit is actually for
&lt;/h2&gt;

&lt;p&gt;Before comparing tool categories, it helps to be explicit about the package goal.&lt;/p&gt;

&lt;p&gt;XmlExtractKit is built for the boring XML jobs that show up in real systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;feeds;&lt;/li&gt;
&lt;li&gt;imports and exports;&lt;/li&gt;
&lt;li&gt;marketplace catalogs;&lt;/li&gt;
&lt;li&gt;partner integrations;&lt;/li&gt;
&lt;li&gt;ETL pipelines;&lt;/li&gt;
&lt;li&gt;SOAP-ish payloads;&lt;/li&gt;
&lt;li&gt;legacy endpoints where XML is still the transport format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those systems, the application usually does not want to live inside an XML tree.&lt;/p&gt;

&lt;p&gt;It usually wants to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read XML safely;&lt;/li&gt;
&lt;li&gt;extract only matching records;&lt;/li&gt;
&lt;li&gt;convert them to arrays;&lt;/li&gt;
&lt;li&gt;continue with validation, normalization, persistence, or queue 
publishing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the package is centered around entry points such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;FastXmlToArray::prettyPrint()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FastXmlToArray::convert()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FastXmlParser::extractPrettyPrint()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FastXmlParser::extractHierarchy()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;XmlElement&lt;/code&gt; for traversal of normalized arrays.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the right comparison is not “Is this package better than every XML library?”&lt;/p&gt;

&lt;p&gt;The right comparison is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is my task primarily full-document XML work, or is it extraction-oriented application work?&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Category 1: small XML, simple reads
&lt;/h2&gt;

&lt;p&gt;Sometimes the job is tiny.&lt;/p&gt;

&lt;p&gt;You receive a small XML payload, you need two or three values, and that is it.&lt;/p&gt;

&lt;p&gt;Typical cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read a config-like XML file;&lt;/li&gt;
&lt;li&gt;parse a short API response;&lt;/li&gt;
&lt;li&gt;inspect a small test fixture;&lt;/li&gt;
&lt;li&gt;run a one-off maintenance script.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For work like that, convenience usually matters more than architecture.&lt;/p&gt;

&lt;p&gt;A simple API that loads the whole document can be perfectly fine because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the file is small;&lt;/li&gt;
&lt;li&gt;memory usage is not a concern;&lt;/li&gt;
&lt;li&gt;the code is short-lived or trivial;&lt;/li&gt;
&lt;li&gt;you do not need a reusable extraction workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; where I would reach for XmlExtractKit first.&lt;/p&gt;

&lt;p&gt;If the XML is small and the task is simple, the cheapest solution is often the right one.&lt;/p&gt;
&lt;h2&gt;
  
  
  Category 2: full-document manipulation
&lt;/h2&gt;

&lt;p&gt;There is another class of XML work that is very different from extraction.&lt;/p&gt;

&lt;p&gt;Sometimes you really do need the whole document tree.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;insert or remove nodes across different branches;&lt;/li&gt;
&lt;li&gt;reorder sections of the document;&lt;/li&gt;
&lt;li&gt;update attributes in multiple places;&lt;/li&gt;
&lt;li&gt;build or rewrite XML as XML;&lt;/li&gt;
&lt;li&gt;perform document-level transformations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a full-document problem.&lt;/p&gt;

&lt;p&gt;In that case, tree-oriented tools and more general XML tooling make much more sense than an extraction-first package.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because the center of gravity is different.&lt;/p&gt;

&lt;p&gt;You are not trying to stream through repeated records and emit arrays. You are trying to work with the XML document itself as a structured tree.&lt;/p&gt;

&lt;p&gt;XmlExtractKit is not trying to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an XML editor;&lt;/li&gt;
&lt;li&gt;a full XML query language;&lt;/li&gt;
&lt;li&gt;a schema validation framework;&lt;/li&gt;
&lt;li&gt;a document transformation engine;&lt;/li&gt;
&lt;li&gt;a large abstraction layer over every XML concern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your real work is document-wide manipulation, use tools designed for document-wide manipulation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Category 3: large XML, but low-level control is enough
&lt;/h2&gt;

&lt;p&gt;Now we get closer to the problems XmlExtractKit is meant to address.&lt;/p&gt;

&lt;p&gt;Suppose the XML file is large.&lt;/p&gt;

&lt;p&gt;You know loading it fully is a bad idea, so you switch to &lt;code&gt;XMLReader&lt;/code&gt; and stream through it node by node. That is already the correct direction.&lt;/p&gt;

&lt;p&gt;For some projects, raw &lt;code&gt;XMLReader&lt;/code&gt; is enough.&lt;/p&gt;

&lt;p&gt;That is true when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the extraction rule is very simple;&lt;/li&gt;
&lt;li&gt;the script is one-off;&lt;/li&gt;
&lt;li&gt;the output shape is minimal;&lt;/li&gt;
&lt;li&gt;you do not expect to reuse the logic;&lt;/li&gt;
&lt;li&gt;you are comfortable writing and maintaining cursor-level code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those cases, a hand-written loop is often fine.&lt;/p&gt;

&lt;p&gt;A minimal baseline might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readOuterXML&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;simplexml_load_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;

        &lt;span class="c1"&gt;// process $data&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is nothing wrong with this if the task stays small.&lt;/p&gt;

&lt;p&gt;The problem is that many XML integrations do not stay small.&lt;/p&gt;

&lt;p&gt;Sooner or later, you accumulate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more fields;&lt;/li&gt;
&lt;li&gt;optional nodes;&lt;/li&gt;
&lt;li&gt;attributes and values in different places;&lt;/li&gt;
&lt;li&gt;nested structures;&lt;/li&gt;
&lt;li&gt;repeated child elements;&lt;/li&gt;
&lt;li&gt;normalization rules;&lt;/li&gt;
&lt;li&gt;multiple feeds with similar logic;&lt;/li&gt;
&lt;li&gt;duplication across projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the point where low-level control stops being the main concern.&lt;/p&gt;

&lt;p&gt;The main concern becomes &lt;strong&gt;maintainable extraction&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 4: large XML and repeated extraction tasks
&lt;/h2&gt;

&lt;p&gt;This is the sweet spot for XmlExtractKit.&lt;/p&gt;

&lt;p&gt;If your task looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open a large XML stream;&lt;/li&gt;
&lt;li&gt;select only matching elements;&lt;/li&gt;
&lt;li&gt;convert them to arrays;&lt;/li&gt;
&lt;li&gt;hand those arrays to application code;&lt;/li&gt;
&lt;li&gt;repeat this pattern across projects;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then a focused extraction toolkit is often the better choice.&lt;/p&gt;

&lt;p&gt;The value is not just performance. The value is the shape of the code.&lt;/p&gt;

&lt;p&gt;A streaming extraction example with XmlExtractKit looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractPrettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
            &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// process $offer as a plain PHP array&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, when you want a stable normalized structure for traversal and not just a pretty printed array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractHierarchy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
            &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// process normalized hierarchy&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is still a streaming model. It still relies on &lt;code&gt;XMLReader&lt;/code&gt; underneath. But the application code is now centered on the real task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;target the elements you care about;&lt;/li&gt;
&lt;li&gt;get arrays back;&lt;/li&gt;
&lt;li&gt;continue with your business pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the exact problem the package is trying to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 5: XML as XML versus XML as transport
&lt;/h2&gt;

&lt;p&gt;This distinction is more important than it sounds.&lt;/p&gt;

&lt;p&gt;Some teams work with XML as a primary document format. In that world, XML structure itself is the thing they care about most.&lt;/p&gt;

&lt;p&gt;Other teams work with XML only because an external system forces them to.&lt;/p&gt;

&lt;p&gt;In those projects, XML is just a transport envelope.&lt;/p&gt;

&lt;p&gt;The application does not want to “stay in XML.” It wants to get out of XML as early as possible.&lt;/p&gt;

&lt;p&gt;That is usually what happens in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;feed processing;&lt;/li&gt;
&lt;li&gt;integration middleware;&lt;/li&gt;
&lt;li&gt;import jobs;&lt;/li&gt;
&lt;li&gt;back-office syncs;&lt;/li&gt;
&lt;li&gt;data ingestion pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If that describes your system, you will usually benefit more from:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XML stream → selected nodes → arrays&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;than from a broad, document-centric XML toolkit.&lt;/p&gt;

&lt;p&gt;That is why &lt;code&gt;FastXmlToArray::prettyPrint()&lt;/code&gt; and &lt;code&gt;FastXmlToArray::convert()&lt;/code&gt; are important entry points in XmlExtractKit. They help you turn XML into application-friendly structures early instead of making the rest of your code care about cursor state or DOM traversal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 6: normalized traversal after conversion
&lt;/h2&gt;

&lt;p&gt;There is one more case where a focused extraction/conversion toolkit helps.&lt;/p&gt;

&lt;p&gt;Sometimes you do not want a raw “pretty” array for immediate processing. You want a stable internal shape you can traverse predictably.&lt;/p&gt;

&lt;p&gt;That is where &lt;code&gt;FastXmlToArray::convert()&lt;/code&gt; and &lt;code&gt;XmlElement&lt;/code&gt; fit nicely.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Conversion\FastXmlToArray&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Navigation\XmlElement&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;product sku="KB-1001"&amp;gt;
    &amp;lt;name&amp;gt;Mechanical Keyboard&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;129.90&amp;lt;/price&amp;gt;
&amp;lt;/product&amp;gt;
XML;&lt;/span&gt;

&lt;span class="nv"&gt;$root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XmlElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FastXmlToArray&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="nv"&gt;$name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nv"&gt;$currency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful when you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a stable normalized hierarchy;&lt;/li&gt;
&lt;li&gt;traversal without re-parsing XML repeatedly;&lt;/li&gt;
&lt;li&gt;a representation that is still close to XML structure, but easier 
to work with than raw cursor logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is another area where a focused toolkit can be more practical than either a tiny one-off parser or a much broader XML stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple decision matrix
&lt;/h2&gt;

&lt;p&gt;Here is the practical version.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Your task&lt;/th&gt;
&lt;th&gt;Best starting point&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small XML, quick read, no reuse expected&lt;/td&gt;
&lt;td&gt;A simple full-document approach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full-document manipulation or transformation&lt;/td&gt;
&lt;td&gt;A document/tree-oriented XML tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large XML, one-off extraction, you are comfortable with low-level code&lt;/td&gt;
&lt;td&gt;Raw &lt;code&gt;XMLReader&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large XML, repeated record extraction, output should be arrays&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FastXmlParser::extractPrettyPrint()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large XML, repeated record extraction, but you want normalized hierarchy&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FastXmlParser::extractHierarchy()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convert XML to arrays for later traversal&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;FastXmlToArray::convert()&lt;/code&gt; + &lt;code&gt;XmlElement&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convert XML to readable PHP arrays immediately&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FastXmlToArray::prettyPrint()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You need custom key names in the output structure&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;XmlConverter&lt;/code&gt; or &lt;code&gt;XmlParser&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the most useful way to think about the package.&lt;/p&gt;

&lt;p&gt;Not as a universal XML winner, but as the right answer for a very specific class of jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  When XmlExtractKit is probably the better fit
&lt;/h2&gt;

&lt;p&gt;I would reach for XmlExtractKit when most of these are true:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML can be large;&lt;/li&gt;
&lt;li&gt;I only need some of the document;&lt;/li&gt;
&lt;li&gt;the file contains repeated business records;&lt;/li&gt;
&lt;li&gt;I want arrays, not DOM-heavy application code;&lt;/li&gt;
&lt;li&gt;I expect similar extraction tasks in more than one project;&lt;/li&gt;
&lt;li&gt;I want to keep the rest of the system unaware of XML cursor 
mechanics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier catalog imports;&lt;/li&gt;
&lt;li&gt;marketplace feed ingestion;&lt;/li&gt;
&lt;li&gt;partner exports;&lt;/li&gt;
&lt;li&gt;ETL pipelines;&lt;/li&gt;
&lt;li&gt;XML payload normalization before queueing or persistence;&lt;/li&gt;
&lt;li&gt;old integrations being consumed by otherwise modern PHP systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When a general XML tool is probably the better fit
&lt;/h2&gt;

&lt;p&gt;I would &lt;strong&gt;not&lt;/strong&gt; pick XmlExtractKit first when most of these are true:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML is small;&lt;/li&gt;
&lt;li&gt;I need full-document traversal and rewriting;&lt;/li&gt;
&lt;li&gt;the end result should remain XML, not arrays;&lt;/li&gt;
&lt;li&gt;I need schema-heavy or transformation-heavy tooling;&lt;/li&gt;
&lt;li&gt;the work is more about XML documents than application data extraction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not a weakness of the package. It is exactly what a focused tool should look like.&lt;/p&gt;

&lt;p&gt;A sharp tool is useful because it knows what it is not trying to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical mistake to avoid
&lt;/h2&gt;

&lt;p&gt;The biggest mistake is to force every XML task into the same mental model.&lt;/p&gt;

&lt;p&gt;Developers often do one of these two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use a full-document approach for a large extraction problem;&lt;/li&gt;
&lt;li&gt;use low-level cursor code for a recurring application-level 
extraction problem that really wants a better abstraction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both create unnecessary cost.&lt;/p&gt;

&lt;p&gt;The first creates avoidable memory pressure and awkward processing flows.&lt;/p&gt;

&lt;p&gt;The second creates avoidable glue code and long-term maintenance pain.&lt;/p&gt;

&lt;p&gt;XmlExtractKit exists in the space between those mistakes.&lt;/p&gt;

&lt;p&gt;It is for the case where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;XMLReader&lt;/code&gt; is the right low-level engine,&lt;/li&gt;
&lt;li&gt;but raw &lt;code&gt;XMLReader&lt;/code&gt; is too close to the metal for the amount of &lt;/li&gt;
&lt;li&gt;extraction work you actually do.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The useful question is not:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“What is the best XML tool in PHP?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The useful question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Am I manipulating XML documents, or extracting application data from XML streams?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your task is primarily document-centric, general XML tools are the right place to start.&lt;/p&gt;

&lt;p&gt;If your task is primarily extraction-centric — especially for large feeds, repeated records, and array-based application pipelines — then XmlExtractKit can be a much better fit.&lt;/p&gt;

&lt;p&gt;That is the core positioning of the package:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;stream XML, extract only what matters, and keep working with plain PHP arrays.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If that is the problem you keep solving, then a focused tool is often more useful than a general one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore the demo project
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xml-extract-kit-demo-repo
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>opensource</category>
      <category>php</category>
      <category>xml</category>
      <category>parsing</category>
    </item>
    <item>
      <title>Benchmark: XMLReader vs XmlExtractKit on a Real Extraction Scenario</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Fri, 17 Apr 2026 11:48:34 +0000</pubDate>
      <link>https://forem.com/sbwerewolf/benchmark-xmlreader-vs-xmlextractkit-on-a-real-extraction-scenario-1608</link>
      <guid>https://forem.com/sbwerewolf/benchmark-xmlreader-vs-xmlextractkit-on-a-real-extraction-scenario-1608</guid>
      <description>&lt;p&gt;When people benchmark XML tools, they often compare the wrong things.&lt;/p&gt;

&lt;p&gt;They compare a full-document parser to a streaming parser. They compare one tool that returns DOM objects to another that returns arrays. They compare a micro-example that does not resemble production code. Or they publish a time number without showing what work was actually done.&lt;/p&gt;

&lt;p&gt;That is not useful.&lt;/p&gt;

&lt;p&gt;For real PHP projects, the right benchmark question is usually much &lt;br&gt;
narrower:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the task is to stream through a large XML feed, extract repeated business records, and turn them into plain PHP arrays, what do I gain by using raw &lt;code&gt;XMLReader&lt;/code&gt; directly, and what do I gain by using a focused extraction library such as XmlExtractKit?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the comparison I care about.&lt;/p&gt;

&lt;p&gt;This article shows how I would benchmark that scenario in a way that is both fair and technically honest.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the benchmark should measure
&lt;/h2&gt;

&lt;p&gt;For extraction-heavy workloads, “total runtime” is not enough.&lt;/p&gt;

&lt;p&gt;A useful benchmark should measure at least four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;total wall-clock time;&lt;/li&gt;
&lt;li&gt;peak memory usage;&lt;/li&gt;
&lt;li&gt;time to first useful record;&lt;/li&gt;
&lt;li&gt;amount of userland code needed to express the task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is not a machine metric, but it matters. In many integrations, the long-term cost is not CPU time. It is the amount of extraction glue code you end up carrying from project to project.&lt;/p&gt;
&lt;h2&gt;
  
  
  The scenario: one realistic extraction task
&lt;/h2&gt;

&lt;p&gt;To keep the comparison fair, both approaches should solve the same task.&lt;/p&gt;

&lt;p&gt;Here is the scenario:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the input is a large XML feed;&lt;/li&gt;
&lt;li&gt;the feed contains repeated &lt;code&gt;&amp;lt;offer&amp;gt;&lt;/code&gt; records;&lt;/li&gt;
&lt;li&gt;each offer has nested elements and attributes;&lt;/li&gt;
&lt;li&gt;we only care about offer records, not the rest of the document;&lt;/li&gt;
&lt;li&gt;the output for each record is a plain PHP array shaped for 
application use.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much closer to a real import or ETL job than an abstract “parse XML” benchmark.&lt;/p&gt;

&lt;p&gt;A minimal example of the feed structure looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;catalog&lt;/span&gt; &lt;span class="na"&gt;generated_at=&lt;/span&gt;&lt;span class="s"&gt;"2026-04-01T08:00:00Z"&lt;/span&gt; &lt;span class="na"&gt;region=&lt;/span&gt;&lt;span class="s"&gt;"eu"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"1001"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;sku&amp;gt;&lt;/span&gt;KB-1001&lt;span class="nt"&gt;&amp;lt;/sku&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Mechanical Keyboard&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;brand&amp;gt;&lt;/span&gt;Acme&lt;span class="nt"&gt;&amp;lt;/brand&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;category&amp;gt;&lt;/span&gt;Keyboards&lt;span class="nt"&gt;&amp;lt;/category&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;129.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;stock&amp;gt;&lt;/span&gt;14&lt;span class="nt"&gt;&amp;lt;/stock&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;service&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"svc-1"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Extended Warranty&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/service&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"1002"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;sku&amp;gt;&lt;/span&gt;MS-1002&lt;span class="nt"&gt;&amp;lt;/sku&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Wireless Mouse&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;brand&amp;gt;&lt;/span&gt;Acme&lt;span class="nt"&gt;&amp;lt;/brand&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;category&amp;gt;&lt;/span&gt;Mice&lt;span class="nt"&gt;&amp;lt;/category&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;39.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;stock&amp;gt;&lt;/span&gt;0&lt;span class="nt"&gt;&amp;lt;/stock&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/catalog&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the benchmark, all implementations should produce records like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'external_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'sku'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'KB-1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Mechanical Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Acme'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'category'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Keyboards'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'129.90'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'stock'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'14'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That common output shape is important. If the two approaches do different work, the benchmark is meaningless.&lt;/p&gt;

&lt;h2&gt;
  
  
  What exactly is being compared
&lt;/h2&gt;

&lt;p&gt;I would compare these two implementations:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Raw &lt;code&gt;XMLReader&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the “do it yourself” baseline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;move the cursor manually;&lt;/li&gt;
&lt;li&gt;detect &lt;code&gt;&amp;lt;offer&amp;gt;&lt;/code&gt; nodes;&lt;/li&gt;
&lt;li&gt;read attributes and child elements;&lt;/li&gt;
&lt;li&gt;assemble arrays by hand;&lt;/li&gt;
&lt;li&gt;yield one normalized record at a time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;XmlExtractKit&lt;/code&gt; (&lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;This is the extraction-first approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;still stream through the XML using &lt;code&gt;XMLReader&lt;/code&gt; underneath;&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;FastXmlParser::extractPrettyPrint()&lt;/code&gt; to yield only matching nodes;&lt;/li&gt;
&lt;li&gt;normalize the resulting arrays into your application format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the comparison that matters in practice because these two approaches solve the same class of problem: &lt;strong&gt;streaming extraction of repeated records&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I am not publishing arbitrary numbers here
&lt;/h2&gt;

&lt;p&gt;A benchmark article becomes misleading very quickly when it includes numbers without context.&lt;/p&gt;

&lt;p&gt;Runtime depends on all of these things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PHP version and build;&lt;/li&gt;
&lt;li&gt;enabled extensions;&lt;/li&gt;
&lt;li&gt;CPU and storage;&lt;/li&gt;
&lt;li&gt;XML shape and depth;&lt;/li&gt;
&lt;li&gt;number of attributes;&lt;/li&gt;
&lt;li&gt;number of repeated child elements;&lt;/li&gt;
&lt;li&gt;whether your normalization step is trivial or heavy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of that, I think the honest way to present this benchmark is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;show the exact task;&lt;/li&gt;
&lt;li&gt;show both implementations;&lt;/li&gt;
&lt;li&gt;show the harness;&lt;/li&gt;
&lt;li&gt;explain what to measure;&lt;/li&gt;
&lt;li&gt;tell readers how to interpret the results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That way the article stays useful even when the raw numbers differ from machine to machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: generate a reproducible XML fixture
&lt;/h2&gt;

&lt;p&gt;For a benchmark, hand-written miniature XML is not enough.&lt;/p&gt;

&lt;p&gt;You want a generated fixture with many repeated records so that the streaming behavior becomes visible.&lt;/p&gt;

&lt;p&gt;Here is a simple generator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;generateCatalogFixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9999&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$fh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;fopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'wb'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s1"&gt;'Cannot open fixture file for writing.'&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Y-m-d'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;?xml version=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;1.0&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; encoding=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;UTF-8&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;?&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;catalog generated_at=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$date&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; region=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;eu&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nv"&gt;$offers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$available&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'false'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$brand&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Brand-'&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Category-'&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;number_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"  &amp;lt;offer id=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; available=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$available&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"    &amp;lt;sku&amp;gt;SKU-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/sku&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"    &amp;lt;name&amp;gt;Product &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"    &amp;lt;brand&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$brand&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/brand&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"    &amp;lt;category&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$category&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/category&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"    &amp;lt;price currency=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;USD&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"    &amp;lt;stock&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$stock&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/stock&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"  &amp;lt;/offer&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"  &amp;lt;service id=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;svc-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;lt;name&amp;gt;Warranty&amp;lt;/name&amp;gt;&amp;lt;/service&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nb"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;/catalog&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nb"&gt;fclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fh&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives both implementations identical input and enough repeated records to make the comparison meaningful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: the raw XMLReader baseline
&lt;/h2&gt;

&lt;p&gt;The baseline should be direct and honest. It should not be intentionally ugly, but it should reflect the kind of code people really end up writing when they solve the problem manually.&lt;/p&gt;

&lt;p&gt;Here is one way to do that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cd"&gt;/**
 * @return Generator&amp;lt;int, array&amp;lt;string, mixed&amp;gt;&amp;gt;
 */&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;iterateOffersWithXmlReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Generator&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
                &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="nv"&gt;$depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'external_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'available'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'sku'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'category'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'stock'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;];&lt;/span&gt;

            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;END_ELEMENT&lt;/span&gt;
                    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
                    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nv"&gt;$depth&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

                &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s1"&gt;'sku'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'sku'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s1"&gt;'brand'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'brand'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s1"&gt;'category'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'category'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s1"&gt;'stock'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'stock'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is nothing wrong with this code. In fact, for some one-off jobs it is a perfectly reasonable solution.&lt;/p&gt;

&lt;p&gt;But it already illustrates the tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;manual cursor handling;&lt;/li&gt;
&lt;li&gt;explicit depth management;&lt;/li&gt;
&lt;li&gt;hand-written field extraction;&lt;/li&gt;
&lt;li&gt;record assembly mixed with traversal logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly what I want the benchmark to capture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: the XmlExtractKit implementation
&lt;/h2&gt;

&lt;p&gt;Now compare it with the extraction-first version.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;XmlExtractKit&lt;/code&gt; still uses the streaming model, but it moves the code closer to the business task: extract matching nodes, then normalize them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cd"&gt;/**
 * @param array&amp;lt;string, mixed&amp;gt; $node
 * @return array&amp;lt;string, mixed&amp;gt;
 */&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;normalizeOffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$node&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'offer'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nv"&gt;$attributes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'@attributes'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nv"&gt;$price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nv"&gt;$priceAttributes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;is_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'@attributes'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'external_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$attributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nv"&gt;$attributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'available'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'sku'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'sku'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'brand'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'category'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'category'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;is_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'@value'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$priceAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'stock'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'stock'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cd"&gt;/**
 * @return Generator&amp;lt;int, array&amp;lt;string, mixed&amp;gt;&amp;gt;
 */&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;iterateOffersWithXmlExtractKit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Generator&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractPrettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
                    &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
                    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$node&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nf"&gt;normalizeOffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$node&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The normalization step is explicit in both versions, which is good. The difference is where the complexity lives.&lt;/p&gt;

&lt;p&gt;With raw &lt;code&gt;XMLReader&lt;/code&gt;, traversal and record assembly are tightly coupled.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;XmlExtractKit&lt;/code&gt;, traversal stays streaming-based, but the extraction phase is lifted into a more reusable form.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: use a benchmark harness that measures the right things
&lt;/h2&gt;

&lt;p&gt;A benchmark harness should consume the records fully, otherwise the result is misleading.&lt;/p&gt;

&lt;p&gt;It should also record the time to the first yielded record, not just the final completion time.&lt;/p&gt;

&lt;p&gt;Here is a simple harness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cd"&gt;/**
 * @param callable(): iterable&amp;lt;array&amp;lt;string, mixed&amp;gt;&amp;gt; $factory
 * @return array&amp;lt;string, int|float&amp;gt;
 */&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;benchmarkExtraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;callable&lt;/span&gt; &lt;span class="nv"&gt;$factory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;gc_collect_cycles&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;function_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'memory_reset_peak_usage'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;memory_reset_peak_usage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$startedAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;hrtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$firstRecordMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$checksum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$factory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$count&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$checksum&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;strlen&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'external_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$firstRecordMs&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$firstRecordMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;hrtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;$startedAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$elapsedMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;hrtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;$startedAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$peakMb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;memory_get_peak_usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'records'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'checksum'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$checksum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'first_record_ms'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$firstRecordMs&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'elapsed_ms'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$elapsedMs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'peak_memory_mb'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$peakMb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here is how you would run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$fixture&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/catalog-benchmark.xml'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nf"&gt;generateCatalogFixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fixture&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$xmlReaderResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;benchmarkExtraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;iterable&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;iterateOffersWithXmlReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fixture&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$xmlExtractKitResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;benchmarkExtraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;iterable&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;iterateOffersWithXmlExtractKit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$fixture&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;var_export&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'xmlreader'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$xmlReaderResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'xmlextractkit'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$xmlExtractKitResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is enough to produce a reproducible benchmark on your own machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would expect to see
&lt;/h2&gt;

&lt;p&gt;I would not assume that one implementation wins every metric.&lt;/p&gt;

&lt;p&gt;That is exactly why this comparison is interesting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Peak memory
&lt;/h3&gt;

&lt;p&gt;If both solutions are truly streaming and process one record at a time, peak memory should stay controlled in both implementations.&lt;/p&gt;

&lt;p&gt;If one of them starts materializing too much intermediate state, this metric will reveal it quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time to first record
&lt;/h3&gt;

&lt;p&gt;This is one of the most underrated metrics for XML extraction workloads.&lt;/p&gt;

&lt;p&gt;If your pipeline can start processing useful data almost immediately, that is a real engineering advantage. It matters for imports, progress reporting, partial processing, and backpressure-aware systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Total runtime
&lt;/h3&gt;

&lt;p&gt;This matters, but it should not dominate the interpretation.&lt;/p&gt;

&lt;p&gt;A lower-level implementation may sometimes squeeze out a small performance advantage. But if that advantage comes with much more traversal glue, branching, and duplicated code, it may not be the better engineering choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Userland code size and complexity
&lt;/h3&gt;

&lt;p&gt;This is not a synthetic concern.&lt;/p&gt;

&lt;p&gt;In production codebases, a solution that is 5% faster but significantly harder to review, extend, and reuse is often the more expensive solution.&lt;/p&gt;

&lt;p&gt;That is why I would always report both machine metrics and code-shape metrics side by side.&lt;/p&gt;

&lt;h2&gt;
  
  
  What would make the benchmark unfair
&lt;/h2&gt;

&lt;p&gt;A benchmark like this becomes unreliable very quickly if you do any of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compare different output shapes;&lt;/li&gt;
&lt;li&gt;parse different XML structures;&lt;/li&gt;
&lt;li&gt;make one side do more normalization work;&lt;/li&gt;
&lt;li&gt;compare a streaming solution to a full-document solution;&lt;/li&gt;
&lt;li&gt;benchmark tiny files that do not stress the streaming model;&lt;/li&gt;
&lt;li&gt;omit the code and only publish the result table.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point is not to “win.”&lt;/p&gt;

&lt;p&gt;The point is to understand the tradeoff for a specific extraction task.&lt;/p&gt;

&lt;h2&gt;
  
  
  My interpretation of this comparison
&lt;/h2&gt;

&lt;p&gt;This is how I would read the result after running the benchmark.&lt;/p&gt;

&lt;p&gt;If raw &lt;code&gt;XMLReader&lt;/code&gt; is slightly faster but the difference is small, I would still strongly consider &lt;code&gt;XmlExtractKit&lt;/code&gt; for repeated integration work because the extraction code is easier to reason about and easier to reuse.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;XmlExtractKit&lt;/code&gt; comes out close on runtime and similar on peak memory, that is already a strong result for the library because it means the higher-level extraction model is not buying convenience at an unreasonable systems cost.&lt;/p&gt;

&lt;p&gt;If the XML task is extremely narrow and unlikely to be reused, raw &lt;code&gt;XMLReader&lt;/code&gt; may still be the right answer.&lt;/p&gt;

&lt;p&gt;But if the workload looks like real feed processing and the extraction pattern shows up again and again, the benefit of moving from cursor choreography to extraction-oriented code becomes very tangible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The most useful XML benchmark is not “which parser is fastest in the abstract.”&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;for this exact extraction task, on this XML shape, with this output model, what do I gain in runtime, memory, first-record latency, and maintainability?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is why I think &lt;strong&gt;raw &lt;code&gt;XMLReader&lt;/code&gt; vs &lt;code&gt;XmlExtractKit&lt;/code&gt;&lt;/strong&gt; is the comparison worth making.&lt;/p&gt;

&lt;p&gt;They belong to the same real-world decision point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;write the traversal and extraction layer yourself;&lt;/li&gt;
&lt;li&gt;or keep the streaming model but use a focused library to reduce the &lt;/li&gt;
&lt;li&gt;amount of glue code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For large XML feeds in modern PHP, that is a benchmark that actually tells you something useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore the demo project
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xml-extract-kit-demo-repo
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>opensource</category>
      <category>php</category>
      <category>xml</category>
      <category>parsing</category>
    </item>
    <item>
      <title>Processing Supplier and Marketplace XML Feeds in PHP</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Thu, 16 Apr 2026 09:50:34 +0000</pubDate>
      <link>https://forem.com/sbwerewolf/processing-supplier-and-marketplace-xml-feeds-in-php-2ifp</link>
      <guid>https://forem.com/sbwerewolf/processing-supplier-and-marketplace-xml-feeds-in-php-2ifp</guid>
      <description>&lt;p&gt;Supplier and marketplace integrations are one of the places where XML refuses to die.&lt;/p&gt;

&lt;p&gt;That is not a complaint. It is just the shape of the problem.&lt;/p&gt;

&lt;p&gt;If you build import pipelines, catalog sync jobs, price updates, availability updates, or partner data bridges, sooner or later you will meet an XML feed that contains the data you need in a format your application does not really want.&lt;/p&gt;

&lt;p&gt;In those situations, the most important design decision is usually not about XML itself.&lt;/p&gt;

&lt;p&gt;It is about the processing model.&lt;/p&gt;

&lt;p&gt;Do you treat the feed as a document that your application should “work with”? Or do you treat it as an external transport format that should be scanned, filtered, converted, normalized, and handed off to the rest of your pipeline as plain PHP data?&lt;/p&gt;

&lt;p&gt;For supplier and marketplace feeds, I strongly prefer the second model.&lt;/p&gt;

&lt;p&gt;That is the approach behind &lt;strong&gt;XmlExtractKit&lt;/strong&gt;, published as &lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real shape of feed-processing work
&lt;/h2&gt;

&lt;p&gt;When people hear “XML parsing,” the task can sound abstract.&lt;/p&gt;

&lt;p&gt;Supplier and marketplace feeds are not abstract.&lt;/p&gt;

&lt;p&gt;They usually look more like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a large file with repeated business records;&lt;/li&gt;
&lt;li&gt;products, offers, items, categories, stock entries, prices, or media blocks;&lt;/li&gt;
&lt;li&gt;partial updates, optional fields, nested elements, repeated child tags, and attributes;&lt;/li&gt;
&lt;li&gt;a downstream pipeline that wants arrays, validated records, database writes, or queue jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the actual engineering task is usually not:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Parse XML.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Extract repeated records from an external feed and transform them into a predictable internal format.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That framing leads to much better implementation choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes feeds different from small XML documents
&lt;/h2&gt;

&lt;p&gt;Small XML documents are often fine with convenient full-document APIs.&lt;/p&gt;

&lt;p&gt;Feeds are different for a few reasons.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. They are repetitive by nature
&lt;/h3&gt;

&lt;p&gt;A feed usually contains the same business structure again and again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;offer&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;product&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;item&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;entry&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a strong signal that you should process the XML as a sequence of records, not as one big tree you want to keep in memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. You rarely need everything
&lt;/h3&gt;

&lt;p&gt;A typical import job does not need every element in the feed.&lt;/p&gt;

&lt;p&gt;It may only need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the offer identifier;&lt;/li&gt;
&lt;li&gt;availability;&lt;/li&gt;
&lt;li&gt;price;&lt;/li&gt;
&lt;li&gt;currency;&lt;/li&gt;
&lt;li&gt;category;&lt;/li&gt;
&lt;li&gt;a few images;&lt;/li&gt;
&lt;li&gt;update timestamps;&lt;/li&gt;
&lt;li&gt;one or two custom attributes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest is often irrelevant for the current pipeline step.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The output is almost never “more XML”
&lt;/h3&gt;

&lt;p&gt;Your import layer usually wants:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;associative arrays;&lt;/li&gt;
&lt;li&gt;normalized field values;&lt;/li&gt;
&lt;li&gt;database rows;&lt;/li&gt;
&lt;li&gt;JSON payloads;&lt;/li&gt;
&lt;li&gt;DTOs;&lt;/li&gt;
&lt;li&gt;queue messages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why feed work is usually an &lt;strong&gt;extraction and normalization&lt;/strong&gt; problem, not an XML-manipulation problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  A representative feed example
&lt;/h2&gt;

&lt;p&gt;Here is a simplified feed structure that is close to what many supplier and marketplace pipelines deal with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;catalog&lt;/span&gt; &lt;span class="na"&gt;generated_at=&lt;/span&gt;&lt;span class="s"&gt;"2026-04-01T08:00:00Z"&lt;/span&gt; &lt;span class="na"&gt;region=&lt;/span&gt;&lt;span class="s"&gt;"eu"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"1001"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;sku&amp;gt;&lt;/span&gt;KB-1001&lt;span class="nt"&gt;&amp;lt;/sku&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Mechanical Keyboard&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;brand&amp;gt;&lt;/span&gt;Acme&lt;span class="nt"&gt;&amp;lt;/brand&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;category&amp;gt;&lt;/span&gt;Keyboards&lt;span class="nt"&gt;&amp;lt;/category&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;129.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;oldprice&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;149.90&lt;span class="nt"&gt;&amp;lt;/oldprice&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;https://cdn.example.test/kb-1001-front.jpg&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;https://cdn.example.test/kb-1001-side.jpg&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;stock&amp;gt;&lt;/span&gt;14&lt;span class="nt"&gt;&amp;lt;/stock&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"1002"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;sku&amp;gt;&lt;/span&gt;MS-1002&lt;span class="nt"&gt;&amp;lt;/sku&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Wireless Mouse&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;brand&amp;gt;&lt;/span&gt;Acme&lt;span class="nt"&gt;&amp;lt;/brand&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;category&amp;gt;&lt;/span&gt;Mice&lt;span class="nt"&gt;&amp;lt;/category&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;39.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;https://cdn.example.test/ms-1002.jpg&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;stock&amp;gt;&lt;/span&gt;0&lt;span class="nt"&gt;&amp;lt;/stock&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/catalog&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is already enough to illustrate the real shape of the work.&lt;/p&gt;

&lt;p&gt;The import pipeline usually does not want to keep this XML structure around.&lt;/p&gt;

&lt;p&gt;It wants to turn each &lt;code&gt;&amp;lt;offer&amp;gt;&lt;/code&gt; into something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'external_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'sku'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'KB-1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Mechanical Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Acme'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'category'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Keyboards'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'129.90'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'old_price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'149.90'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'stock'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'pictures'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'https://cdn.example.test/kb-1001-front.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'https://cdn.example.test/kb-1001-side.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the internal target.&lt;/p&gt;

&lt;p&gt;Once you are clear about that, the XML side becomes much easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two stages that matter most
&lt;/h2&gt;

&lt;p&gt;For feed pipelines, I think it helps to split the work into two explicit stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: extraction
&lt;/h3&gt;

&lt;p&gt;This is where you identify the repeated record you care about and convert it into a predictable PHP structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: normalization
&lt;/h3&gt;

&lt;p&gt;This is where you adapt that structure to your own application model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rename fields;&lt;/li&gt;
&lt;li&gt;cast values;&lt;/li&gt;
&lt;li&gt;collapse optional fields;&lt;/li&gt;
&lt;li&gt;map categories;&lt;/li&gt;
&lt;li&gt;validate currency or stock rules;&lt;/li&gt;
&lt;li&gt;prepare records for persistence or messaging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trying to collapse these two stages into one giant parsing function usually makes the code harder to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why streaming is such a good fit for feeds
&lt;/h2&gt;

&lt;p&gt;Supplier and marketplace feeds are one of the best use cases for streaming XML traversal.&lt;/p&gt;

&lt;p&gt;The reasons are practical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;files can become large over time;&lt;/li&gt;
&lt;li&gt;records are naturally repeated;&lt;/li&gt;
&lt;li&gt;each record can often be processed independently;&lt;/li&gt;
&lt;li&gt;you usually do not need the whole document tree;&lt;/li&gt;
&lt;li&gt;early filtering is valuable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly where &lt;code&gt;XMLReader&lt;/code&gt; and extraction-first libraries built on top of it become useful.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;XmlExtractKit&lt;/strong&gt;, I usually approach these feeds as “find repeated offers and turn them into arrays.”&lt;/p&gt;

&lt;p&gt;Here is a streaming extraction example using &lt;code&gt;FastXmlParser::extractPrettyPrint()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tempnam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sys_get_temp_dir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'supplier-feed-'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;file_put_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;catalog generated_at="2026-04-01T08:00:00Z" region="eu"&amp;gt;
  &amp;lt;offer id="1001" available="true"&amp;gt;
    &amp;lt;sku&amp;gt;KB-1001&amp;lt;/sku&amp;gt;
    &amp;lt;name&amp;gt;Mechanical Keyboard&amp;lt;/name&amp;gt;
    &amp;lt;brand&amp;gt;Acme&amp;lt;/brand&amp;gt;
    &amp;lt;category&amp;gt;Keyboards&amp;lt;/category&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;129.90&amp;lt;/price&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/kb-1001-front.jpg&amp;lt;/picture&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/kb-1001-side.jpg&amp;lt;/picture&amp;gt;
    &amp;lt;stock&amp;gt;14&amp;lt;/stock&amp;gt;
  &amp;lt;/offer&amp;gt;
  &amp;lt;service id="svc-1"&amp;gt;
    &amp;lt;name&amp;gt;Extended Warranty&amp;lt;/name&amp;gt;
  &amp;lt;/service&amp;gt;
  &amp;lt;offer id="1002" available="false"&amp;gt;
    &amp;lt;sku&amp;gt;MS-1002&amp;lt;/sku&amp;gt;
    &amp;lt;name&amp;gt;Wireless Mouse&amp;lt;/name&amp;gt;
    &amp;lt;brand&amp;gt;Acme&amp;lt;/brand&amp;gt;
    &amp;lt;category&amp;gt;Mice&amp;lt;/category&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;39.90&amp;lt;/price&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/ms-1002.jpg&amp;lt;/picture&amp;gt;
    &amp;lt;stock&amp;gt;0&amp;lt;/stock&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/catalog&amp;gt;
XML&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML feed.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractPrettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="no"&gt;JSON_PRETTY_PRINT&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;JSON_UNESCAPED_SLASHES&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That extraction result is already close to what the rest of the application needs.&lt;/p&gt;

&lt;p&gt;It is still XML-derived data, but it is no longer trapped in XML traversal logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why readable arrays help so much in feed work
&lt;/h2&gt;

&lt;p&gt;In feed-processing pipelines, readability is not cosmetic.&lt;/p&gt;

&lt;p&gt;It directly affects how quickly you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inspect bad records;&lt;/li&gt;
&lt;li&gt;log partial failures;&lt;/li&gt;
&lt;li&gt;test normalization rules;&lt;/li&gt;
&lt;li&gt;compare incoming and outgoing payloads;&lt;/li&gt;
&lt;li&gt;reason about optional fields;&lt;/li&gt;
&lt;li&gt;support multiple partner formats.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why array output is so practical.&lt;/p&gt;

&lt;p&gt;For example, one extracted &lt;code&gt;&amp;lt;offer&amp;gt;&lt;/code&gt; might look like this after &lt;code&gt;extractPrettyPrint()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'offer'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'@attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'sku'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'KB-1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Mechanical Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Acme'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'category'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Keyboards'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'@value'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'129.90'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'@attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'picture'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'https://cdn.example.test/kb-1001-front.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'https://cdn.example.test/kb-1001-side.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'stock'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'14'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a much better input for normalization code than a half-processed XML cursor state.&lt;/p&gt;

&lt;h2&gt;
  
  
  The normalization step is where your business rules belong
&lt;/h2&gt;

&lt;p&gt;Once the feed record is in array form, you can normalize it with ordinary PHP code.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cd"&gt;/**
 * @param array&amp;lt;string, mixed&amp;gt; $record
 * @return array&amp;lt;string, mixed&amp;gt;
 */&lt;/span&gt;
&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;normalizeOffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'offer'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="nv"&gt;$pictures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'picture'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;is_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$pictures&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$pictures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$pictures&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'external_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'@attributes'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'@attributes'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'available'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;'false'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'sku'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'sku'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'brand'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'category'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'category'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'@value'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'@attributes'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'stock'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;isset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'stock'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'stock'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'pictures'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;array_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;array_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$pictures&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'is_string'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where business logic belongs.&lt;/p&gt;

&lt;p&gt;Not in low-level XML traversal. Not in cursor movement. Not in string fragments.&lt;/p&gt;

&lt;p&gt;A clean import architecture keeps those concerns separate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repeated tags, attributes, and optional fields are not edge cases
&lt;/h2&gt;

&lt;p&gt;In feed processing, these are normal conditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple images;&lt;/li&gt;
&lt;li&gt;optional old price;&lt;/li&gt;
&lt;li&gt;empty stock fields;&lt;/li&gt;
&lt;li&gt;attributes that carry business meaning;&lt;/li&gt;
&lt;li&gt;tags that are present for some suppliers and absent for others.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is another reason I prefer extraction to arrays early.&lt;/p&gt;

&lt;p&gt;Once the record is in a stable PHP structure, handling these cases becomes straightforward.&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;default missing fields;&lt;/li&gt;
&lt;li&gt;cast types;&lt;/li&gt;
&lt;li&gt;merge repeated tags into lists;&lt;/li&gt;
&lt;li&gt;strip noise;&lt;/li&gt;
&lt;li&gt;build validation rules around familiar array shapes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One feed is manageable. Ten feeds expose architecture problems
&lt;/h2&gt;

&lt;p&gt;A lot of parsing approaches look acceptable when there is only one partner.&lt;/p&gt;

&lt;p&gt;The trouble begins when the system grows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier A sends &lt;code&gt;offer&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;supplier B sends &lt;code&gt;item&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;marketplace C adds nested media blocks;&lt;/li&gt;
&lt;li&gt;another feed uses attributes where the previous one used child 
elements;&lt;/li&gt;
&lt;li&gt;one integration sends a full nightly catalog;&lt;/li&gt;
&lt;li&gt;another sends partial incremental updates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, the quality of your processing model matters much more than the convenience of a single parser call.&lt;/p&gt;

&lt;p&gt;The goal is not just “parse this file.”&lt;/p&gt;

&lt;p&gt;The goal is to build a repeatable pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;extract repeated records;&lt;/li&gt;
&lt;li&gt;convert them into stable PHP structures;&lt;/li&gt;
&lt;li&gt;normalize them into your domain shape;&lt;/li&gt;
&lt;li&gt;pass them downstream.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That pattern scales much better than spreading XML handling rules throughout the codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful split for real projects
&lt;/h2&gt;

&lt;p&gt;For supplier and marketplace XML feeds, I think the cleanest split is this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Integration edge
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;read the XML stream;&lt;/li&gt;
&lt;li&gt;extract only target records;&lt;/li&gt;
&lt;li&gt;convert them into arrays.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Normalization layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;cast and validate fields;&lt;/li&gt;
&lt;li&gt;reconcile naming differences;&lt;/li&gt;
&lt;li&gt;apply partner-specific mapping rules;&lt;/li&gt;
&lt;li&gt;create consistent internal records.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Application layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;persist catalog data;&lt;/li&gt;
&lt;li&gt;emit events;&lt;/li&gt;
&lt;li&gt;update search indexes;&lt;/li&gt;
&lt;li&gt;enqueue downstream jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps XML where it belongs: at the edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  When a full-document approach is still fine
&lt;/h2&gt;

&lt;p&gt;Not every feed needs streaming.&lt;/p&gt;

&lt;p&gt;If the XML is small and the structure is simple, a full-document approach may be completely acceptable.&lt;/p&gt;

&lt;p&gt;But supplier and marketplace integrations tend to drift in one direction over time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more records;&lt;/li&gt;
&lt;li&gt;more nested data;&lt;/li&gt;
&lt;li&gt;more optional fields;&lt;/li&gt;
&lt;li&gt;more partner variants;&lt;/li&gt;
&lt;li&gt;more operational pressure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why an extraction-first model is often the safer default.&lt;/p&gt;

&lt;p&gt;It is not about premature optimization.&lt;/p&gt;

&lt;p&gt;It is about choosing a processing pattern that continues to work when the feed stops being toy-sized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Supplier and marketplace XML feeds are rarely difficult because XML is mysterious.&lt;/p&gt;

&lt;p&gt;They are difficult because they combine repetition, size, optional structure, external control, and business-specific normalization rules.&lt;/p&gt;

&lt;p&gt;That is why I think the most productive way to handle them in PHP is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stream the feed when needed;&lt;/li&gt;
&lt;li&gt;extract repeated records instead of loading everything;&lt;/li&gt;
&lt;li&gt;convert XML into plain arrays early;&lt;/li&gt;
&lt;li&gt;keep normalization and business rules outside low-level XML traversal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the workflow I wanted from &lt;strong&gt;XmlExtractKit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not a giant XML abstraction layer. Not an attempt to make XML pleasant.&lt;/p&gt;

&lt;p&gt;Just a practical path from external XML feeds to application-ready PHP data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore the demo project
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xml-extract-kit-demo-repo
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>opensource</category>
      <category>php</category>
      <category>xml</category>
      <category>parsing</category>
    </item>
    <item>
      <title>Converting XML Feeds to Plain PHP Arrays in Modern PHP</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Wed, 15 Apr 2026 19:20:07 +0000</pubDate>
      <link>https://forem.com/sbwerewolf/converting-xml-feeds-to-plain-php-arrays-in-modern-php-26j1</link>
      <guid>https://forem.com/sbwerewolf/converting-xml-feeds-to-plain-php-arrays-in-modern-php-26j1</guid>
      <description>&lt;p&gt;When people say they need to “work with XML” in PHP, that phrasing is often already slightly misleading.&lt;/p&gt;

&lt;p&gt;In most business applications, XML is not the format you actually want to keep around.&lt;/p&gt;

&lt;p&gt;It is just the format you received.&lt;/p&gt;

&lt;p&gt;A supplier feed arrives as XML. A marketplace export arrives as XML. A partner integration still speaks XML. A legacy endpoint responds with XML. But once the data enters your application, the rest of the code usually does not want an XML tree.&lt;/p&gt;

&lt;p&gt;It wants ordinary PHP data.&lt;/p&gt;

&lt;p&gt;That is the practical framing I use in modern PHP projects:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XML is usually a transport format. The real goal is to convert the useful parts into plain PHP arrays as early as possible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you look at the problem this way, a lot of implementation decisions become much clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why arrays are usually the real target
&lt;/h2&gt;

&lt;p&gt;Most application code does not benefit from carrying XML semantics deeper into the stack than necessary.&lt;/p&gt;

&lt;p&gt;Your service layer, validation logic, queue payloads, DTO mappers, logging, database writers, and JSON APIs usually work best with plain associative arrays.&lt;/p&gt;

&lt;p&gt;That means the useful pipeline often looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;XML feed → extracted records → plain PHP arrays → validation / normalization / persistence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of the reasons I built &lt;strong&gt;XmlExtractKit&lt;/strong&gt; for PHP, published as &lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The package is designed around a very boring but very common need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take XML input;&lt;/li&gt;
&lt;li&gt;extract the records that matter;&lt;/li&gt;
&lt;li&gt;get plain PHP arrays back;&lt;/li&gt;
&lt;li&gt;keep the rest of the application free from low-level XML handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a better fit for modern application code than dragging cursor &lt;br&gt;
logic or DOM structures through multiple layers.&lt;/p&gt;
&lt;h2&gt;
  
  
  A typical XML feed problem
&lt;/h2&gt;

&lt;p&gt;Suppose a partner sends you a product feed like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;feed&lt;/span&gt; &lt;span class="na"&gt;generated_at=&lt;/span&gt;&lt;span class="s"&gt;"2026-03-28T09:00:00Z"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"206111"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;USB-C Dock&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;129.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;https://cdn.example.test/1.jpg&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;https://cdn.example.test/2.jpg&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/feed&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What does the rest of your application usually want from this?&lt;/p&gt;

&lt;p&gt;Not an XML tree.&lt;/p&gt;

&lt;p&gt;Usually something closer to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'feed'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'@attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'generated_at'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-28T09:00:00Z'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'offer'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'@attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'206111'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USB-C Dock'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'@value'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'129.90'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'@attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'picture'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'https://cdn.example.test/1.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'https://cdn.example.test/2.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That structure is already much more useful.&lt;/p&gt;

&lt;p&gt;You can serialize it, validate it, map it to a DTO, send it to a queue, store it, or normalize it further.&lt;/p&gt;

&lt;p&gt;That is why I think “XML to arrays” is a much more practical category than “XML processing” for a lot of real PHP work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first decision: readable arrays or normalized hierarchy
&lt;/h2&gt;

&lt;p&gt;One thing I like about XmlExtractKit is that it makes this tradeoff &lt;br&gt;
explicit.&lt;/p&gt;

&lt;p&gt;There are two main output styles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;readable output&lt;/strong&gt;, via &lt;code&gt;FastXmlToArray::prettyPrint()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;normalized output&lt;/strong&gt;, via &lt;code&gt;FastXmlToArray::convert()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They solve related but different problems.&lt;/p&gt;
&lt;h3&gt;
  
  
  Readable output: best for application code
&lt;/h3&gt;

&lt;p&gt;If your goal is to move XML into ordinary PHP code quickly, readable arrays are usually the right default.&lt;/p&gt;

&lt;p&gt;Here is a direct conversion example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Conversion\FastXmlToArray&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;feed generated_at="2026-03-28T09:00:00Z"&amp;gt;
  &amp;lt;offer id="206111" available="true"&amp;gt;
    &amp;lt;name&amp;gt;USB-C Dock&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;129.90&amp;lt;/price&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/1.jpg&amp;lt;/picture&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/2.jpg&amp;lt;/picture&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/feed&amp;gt;
XML;&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastXmlToArray&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;prettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="no"&gt;JSON_PRETTY_PRINT&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;JSON_UNESCAPED_SLASHES&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"feed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"generated_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-28T09:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"offer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"206111"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USB-C Dock"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"@value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"129.90"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"picture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"https://cdn.example.test/1.jpg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"https://cdn.example.test/2.jpg"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This output format is intentionally convenient.&lt;/p&gt;

&lt;p&gt;It is useful when you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;return a structured payload from a service;&lt;/li&gt;
&lt;li&gt;serialize data to JSON;&lt;/li&gt;
&lt;li&gt;inspect logs or debug dumps;&lt;/li&gt;
&lt;li&gt;pass a transformed record into validation or normalization code;&lt;/li&gt;
&lt;li&gt;feed the result into downstream application logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The array shape follows a few simple rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;attributes go under &lt;code&gt;@attributes&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;element text goes under &lt;code&gt;@value&lt;/code&gt; when attributes are also present;&lt;/li&gt;
&lt;li&gt;repeated child tags become indexed arrays.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of shape that works well in typical modern PHP code.&lt;/p&gt;

&lt;h2&gt;
  
  
  When normalized output is the better choice
&lt;/h2&gt;

&lt;p&gt;Readable output is great for many pipelines, but sometimes you want a structure that is more explicit and more stable for traversal.&lt;/p&gt;

&lt;p&gt;That is where &lt;code&gt;FastXmlToArray::convert()&lt;/code&gt; comes in.&lt;/p&gt;

&lt;p&gt;Instead of optimizing for immediate readability, it gives each node the same predictable contract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;n&lt;/code&gt; = element name;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;v&lt;/code&gt; = direct value;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;a&lt;/code&gt; = attributes;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;s&lt;/code&gt; = child sequence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the same feed converted into normalized hierarchy form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Conversion\FastXmlToArray&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;feed generated_at="2026-03-28T09:00:00Z"&amp;gt;
  &amp;lt;offer id="206111" available="true"&amp;gt;
    &amp;lt;name&amp;gt;USB-C Dock&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;129.90&amp;lt;/price&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/1.jpg&amp;lt;/picture&amp;gt;
    &amp;lt;picture&amp;gt;https://cdn.example.test/2.jpg&amp;lt;/picture&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/feed&amp;gt;
XML;&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastXmlToArray&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;var_export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'n'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'feed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'a'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'generated_at'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-28T09:00:00Z'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="s1"&gt;'s'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s1"&gt;'n'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s1"&gt;'a'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
      &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'206111'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="s1"&gt;'s'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
      &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="s1"&gt;'n'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s1"&gt;'v'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USB-C Dock'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="s1"&gt;'n'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s1"&gt;'v'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'129.90'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s1"&gt;'a'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
          &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="s1"&gt;'n'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'picture'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s1"&gt;'v'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'https://cdn.example.test/1.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="s1"&gt;'n'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'picture'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s1"&gt;'v'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'https://cdn.example.test/2.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This output is not as immediately pleasant to read, but it is very useful when you care about consistent traversal and adapters.&lt;/p&gt;

&lt;p&gt;That becomes valuable when you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;build wrappers on top of a stable node contract;&lt;/li&gt;
&lt;li&gt;walk the structure programmatically;&lt;/li&gt;
&lt;li&gt;distinguish explicitly between element names, values, attributes, and children;&lt;/li&gt;
&lt;li&gt;create internal tooling that should not depend on the shape of one specific XML document.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, &lt;code&gt;prettyPrint()&lt;/code&gt; is great when the output is the destination. &lt;code&gt;convert()&lt;/code&gt; is great when the output is an intermediate representation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Arrays are only useful if they are easy to navigate
&lt;/h2&gt;

&lt;p&gt;Sometimes a plain array is enough.&lt;/p&gt;

&lt;p&gt;Sometimes you want something slightly higher-level without going back to low-level XML logic.&lt;/p&gt;

&lt;p&gt;That is where &lt;code&gt;XmlElement&lt;/code&gt; fits very nicely.&lt;/p&gt;

&lt;p&gt;You can take the normalized hierarchy returned by &lt;code&gt;FastXmlToArray::convert()&lt;/code&gt; and wrap it in &lt;code&gt;XmlElement&lt;/code&gt; for convenient traversal.&lt;/p&gt;

&lt;p&gt;Here is a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Conversion\FastXmlToArray&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Navigation\XmlElement&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;catalog region="eu"&amp;gt;
  &amp;lt;offer id="1001" available="true"&amp;gt;
    &amp;lt;name&amp;gt;Keyboard&amp;lt;/name&amp;gt;
    &amp;lt;tag&amp;gt;office&amp;lt;/tag&amp;gt;
    &amp;lt;tag&amp;gt;usb&amp;lt;/tag&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/catalog&amp;gt;
XML;&lt;/span&gt;

&lt;span class="nv"&gt;$root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XmlElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FastXmlToArray&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'offer'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'region'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$root&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;hasElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'offer'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'yes'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'no'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$attribute&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$attribute&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'='&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$attribute&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$tagValues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;array_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XmlElement&lt;/span&gt; &lt;span class="nv"&gt;$tag&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$tag&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tag'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nb"&gt;var_export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$tagValues&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a useful middle ground.&lt;/p&gt;

&lt;p&gt;The data is still array-based and application-friendly, but navigation becomes clearer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;name()&lt;/code&gt; for the current element name;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get()&lt;/code&gt; for attributes;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hasElement()&lt;/code&gt; to check for children;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pull()&lt;/code&gt; or &lt;code&gt;elements()&lt;/code&gt; to navigate down the structure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is often cleaner than passing around raw nested arrays with hardcoded indexes everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters in feed processing
&lt;/h2&gt;

&lt;p&gt;Feed processing is usually repetitive.&lt;/p&gt;

&lt;p&gt;You receive XML, extract records, normalize them, validate them, and push them further into the pipeline.&lt;/p&gt;

&lt;p&gt;That means the most practical XML question is often not:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Which library can represent XML most completely?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Which approach gets me from XML to application-ready records with the least friction?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is why plain PHP arrays are such a strong target format for feed work.&lt;/p&gt;

&lt;p&gt;They are easy to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inspect;&lt;/li&gt;
&lt;li&gt;serialize;&lt;/li&gt;
&lt;li&gt;compare in tests;&lt;/li&gt;
&lt;li&gt;transform;&lt;/li&gt;
&lt;li&gt;validate;&lt;/li&gt;
&lt;li&gt;store;&lt;/li&gt;
&lt;li&gt;hand off to other services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By contrast, keeping XML structures alive deep into the business layer usually increases the amount of incidental complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about large feeds?
&lt;/h2&gt;

&lt;p&gt;For large XML feeds, the array-conversion story should not force you back into full-document loading.&lt;/p&gt;

&lt;p&gt;This is where the streaming entry points matter.&lt;/p&gt;

&lt;p&gt;If you want readable application-friendly output directly from selected nodes in a large document, there is &lt;code&gt;FastXmlParser::extractPrettyPrint()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is a compact example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tempnam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sys_get_temp_dir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'xml-extract-kit-'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nb"&gt;file_put_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;catalog&amp;gt;
  &amp;lt;offer id="1"&amp;gt;
    &amp;lt;name&amp;gt;Keyboard&amp;lt;/name&amp;gt;
    &amp;lt;price&amp;gt;49.90&amp;lt;/price&amp;gt;
  &amp;lt;/offer&amp;gt;
  &amp;lt;service id="s-1"&amp;gt;
    &amp;lt;name&amp;gt;Warranty&amp;lt;/name&amp;gt;
  &amp;lt;/service&amp;gt;
  &amp;lt;offer id="2"&amp;gt;
    &amp;lt;name&amp;gt;Mouse&amp;lt;/name&amp;gt;
    &amp;lt;price&amp;gt;19.90&amp;lt;/price&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/catalog&amp;gt;
XML&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractPrettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="no"&gt;JSON_PRETTY_PRINT&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;JSON_UNESCAPED_SLASHES&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you the same extraction-first workflow as in the earlier articles, but with output that is already convenient for application &lt;br&gt;
code.&lt;/p&gt;

&lt;p&gt;So the package is not making you choose between streaming and readable arrays.&lt;/p&gt;

&lt;p&gt;It is designed to give you both.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful decision rule
&lt;/h2&gt;

&lt;p&gt;For practical PHP work, I think the following rule holds up well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use &lt;strong&gt;&lt;code&gt;prettyPrint()&lt;/code&gt;&lt;/strong&gt; when you want readable arrays now;&lt;/li&gt;
&lt;li&gt;use &lt;strong&gt;&lt;code&gt;convert()&lt;/code&gt;&lt;/strong&gt; when you want a stable internal node model;&lt;/li&gt;
&lt;li&gt;use &lt;strong&gt;&lt;code&gt;XmlElement&lt;/code&gt;&lt;/strong&gt; when you want to traverse normalized arrays 
more comfortably;&lt;/li&gt;
&lt;li&gt;use &lt;strong&gt;&lt;code&gt;extractPrettyPrint()&lt;/code&gt;&lt;/strong&gt; when the XML is large and you only 
want selected records in readable form;&lt;/li&gt;
&lt;li&gt;use &lt;strong&gt;&lt;code&gt;extractHierarchy()&lt;/code&gt;&lt;/strong&gt; when the XML is large and you want 
selected records in normalized form.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a much more actionable way to think about XML work than asking for a single “best XML library.”&lt;/p&gt;

&lt;h2&gt;
  
  
  One more practical point: XML should not leak everywhere
&lt;/h2&gt;

&lt;p&gt;I think one of the easiest mistakes in integration code is to let transport concerns leak too far.&lt;/p&gt;

&lt;p&gt;A feed arrives as XML, so suddenly everything downstream starts thinking in XML terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;node trees;&lt;/li&gt;
&lt;li&gt;cursor state;&lt;/li&gt;
&lt;li&gt;fragment parsing;&lt;/li&gt;
&lt;li&gt;nested traversal rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is usually unnecessary.&lt;/p&gt;

&lt;p&gt;A much cleaner architecture is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;receive XML;&lt;/li&gt;
&lt;li&gt;convert it into a representation your application actually likes;&lt;/li&gt;
&lt;li&gt;keep business logic focused on plain PHP data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is exactly why array-first conversion is so useful. It creates &lt;br&gt;
a boundary.&lt;/p&gt;

&lt;p&gt;The XML stays near the integration edge, where it belongs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In modern PHP projects, XML is often not the thing you want to work with. It is the thing you need to get past.&lt;/p&gt;

&lt;p&gt;That is why converting XML feeds to plain PHP arrays is such a practical strategy.&lt;/p&gt;

&lt;p&gt;Readable arrays are ideal when you want immediate application-friendly data. Normalized arrays are ideal when you want a stable traversal model. And for large feeds, streaming extraction lets you keep the memory-safe approach without sacrificing useful output.&lt;/p&gt;

&lt;p&gt;That combination is what I wanted from XmlExtractKit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XML as input;&lt;/li&gt;
&lt;li&gt;arrays as output;&lt;/li&gt;
&lt;li&gt;streaming when needed;&lt;/li&gt;
&lt;li&gt;low friction in the application layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If that is the kind of PHP XML workflow you deal with, &lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt; is built for exactly that use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore the demo project
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xml-extract-kit-demo-repo
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>xml</category>
      <category>xmlreader</category>
      <category>etl</category>
      <category>integration</category>
    </item>
    <item>
      <title>XMLReader vs XmlExtractKit for Real XML Extraction Tasks in PHP</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Tue, 14 Apr 2026 18:50:24 +0000</pubDate>
      <link>https://forem.com/sbwerewolf/xmlreader-vs-xmlextractkit-for-real-xml-extraction-tasks-in-php-1c43</link>
      <guid>https://forem.com/sbwerewolf/xmlreader-vs-xmlextractkit-for-real-xml-extraction-tasks-in-php-1c43</guid>
      <description>&lt;p&gt;When PHP developers compare XML approaches, the comparison often starts in the wrong place.&lt;/p&gt;

&lt;p&gt;It usually becomes a vague question like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What is the best XML library for PHP?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is too broad to be useful.&lt;/p&gt;

&lt;p&gt;In real projects, the question is usually much narrower:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I have a large XML file;&lt;/li&gt;
&lt;li&gt;it contains repeated business records;&lt;/li&gt;
&lt;li&gt;I only need some of those records;&lt;/li&gt;
&lt;li&gt;I want application-friendly PHP data, not a full in-memory XML tree.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not a general XML problem.&lt;/p&gt;

&lt;p&gt;It is an &lt;strong&gt;extraction task&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And for this kind of work, the most honest comparison is often not between two third-party packages. It is between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;raw &lt;code&gt;XMLReader&lt;/code&gt;&lt;/strong&gt;, where you write the extraction logic yourself;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;a focused extraction toolkit&lt;/strong&gt;, where the streaming model stays &lt;/li&gt;
&lt;li&gt;the same but the glue code becomes reusable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my case, that focused toolkit is &lt;strong&gt;XmlExtractKit&lt;/strong&gt;, published as &lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This article compares both approaches on the same practical task.&lt;/p&gt;

&lt;h2&gt;
  
  
  The task
&lt;/h2&gt;

&lt;p&gt;Suppose we have a large XML feed that contains repeated &lt;code&gt;&amp;lt;offer&amp;gt;&lt;/code&gt; records, mixed with other node types that we do not care about.&lt;/p&gt;

&lt;p&gt;We want to extract each offer into a PHP array with a shape like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;49.90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the sample XML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;catalog&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"1001"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Keyboard&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;49.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;service&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"s-1"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Warranty&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/service&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;offer&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"1002"&lt;/span&gt; &lt;span class="na"&gt;available=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Mouse&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;price&lt;/span&gt; &lt;span class="na"&gt;currency=&lt;/span&gt;&lt;span class="s"&gt;"USD"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;19.90&lt;span class="nt"&gt;&amp;lt;/price&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/offer&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/catalog&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a simple example, but it is representative of a lot of real XML integration work: repeated nodes, some attributes, some nested values, and other elements that should be ignored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: raw XMLReader
&lt;/h2&gt;

&lt;p&gt;The low-level memory-safe baseline in PHP is &lt;code&gt;XMLReader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That makes it the right foundation for large-file extraction.&lt;/p&gt;

&lt;p&gt;Here is one way to solve the task with plain &lt;code&gt;XMLReader&lt;/code&gt; and a small &lt;br&gt;
amount of helper parsing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$offerXml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readOuterXML&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;simplexml_load_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offerXml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$rows&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'available'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nb"&gt;var_export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$rows&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;49.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1002'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;19.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a perfectly valid solution.&lt;/p&gt;

&lt;p&gt;It is memory-safe in the important sense: we are not loading the whole XML document into memory. We are moving through the stream and extracting matching nodes.&lt;/p&gt;

&lt;p&gt;For a one-off task, this may be enough.&lt;/p&gt;

&lt;p&gt;But there are tradeoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the raw XMLReader version costs you
&lt;/h2&gt;

&lt;p&gt;The raw &lt;code&gt;XMLReader&lt;/code&gt; version works, but its cost is not obvious when the example is this small.&lt;/p&gt;

&lt;p&gt;The real cost shows up later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;matching logic has to be repeated or abstracted;&lt;/li&gt;
&lt;li&gt;field extraction rules are embedded directly in the loop;&lt;/li&gt;
&lt;li&gt;nested XML handling becomes more verbose;&lt;/li&gt;
&lt;li&gt;attributes and text values require repeated manual decisions;&lt;/li&gt;
&lt;li&gt;optional fields quickly add conditionals;&lt;/li&gt;
&lt;li&gt;the same extraction pattern gets reimplemented across projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the critical point: the issue is not whether &lt;code&gt;XMLReader&lt;/code&gt; is capable. It absolutely is.&lt;/p&gt;

&lt;p&gt;The issue is whether &lt;strong&gt;low-level cursor code is the right place to keep business extraction logic&lt;/strong&gt; once the project grows beyond a toy example.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 2: XmlExtractKit on top of XMLReader
&lt;/h2&gt;

&lt;p&gt;Now let us solve the same extraction task using &lt;strong&gt;XmlExtractKit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The important thing to understand is that the streaming model does not change. Under the hood, the workflow is still based on &lt;code&gt;XMLReader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What changes is the level of abstraction.&lt;/p&gt;

&lt;p&gt;Instead of manually managing cursor flow and converting node fragments inline, the library lets me say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stream through the XML;&lt;/li&gt;
&lt;li&gt;select matching nodes;&lt;/li&gt;
&lt;li&gt;receive structured PHP arrays for those nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the same scenario using &lt;code&gt;FastXmlParser::extractHierarchy()&lt;/code&gt; &lt;br&gt;
and &lt;code&gt;XmlElement&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Navigation\XmlElement&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractHierarchy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
            &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offerData&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XmlElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offerData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;current&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nv"&gt;$rows&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'available'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nb"&gt;var_export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$rows&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is the same kind of application-level array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;49.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;array&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'1002'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;19.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'currency'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'USD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the key comparison.&lt;/p&gt;

&lt;p&gt;Both approaches are streaming-based. Both avoid loading the full XML document into memory. Both can solve the same extraction task.&lt;/p&gt;

&lt;p&gt;The difference is where the complexity lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical difference
&lt;/h2&gt;

&lt;p&gt;With raw &lt;code&gt;XMLReader&lt;/code&gt;, the extraction loop carries several &lt;br&gt;
responsibilities at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traversal;&lt;/li&gt;
&lt;li&gt;node matching;&lt;/li&gt;
&lt;li&gt;fragment parsing;&lt;/li&gt;
&lt;li&gt;data mapping;&lt;/li&gt;
&lt;li&gt;shape normalization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With XmlExtractKit, traversal remains streaming-based, but extraction becomes more explicit and reusable.&lt;/p&gt;

&lt;p&gt;That matters because most XML integration code is not judged only by whether it works today. It is judged by what happens when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add another field;&lt;/li&gt;
&lt;li&gt;support optional nodes;&lt;/li&gt;
&lt;li&gt;process another repeated element type;&lt;/li&gt;
&lt;li&gt;reuse the same extraction pattern in a second project;&lt;/li&gt;
&lt;li&gt;hand the code to someone else six months later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the comparison is not just about performance. It is about &lt;strong&gt;where you want complexity to accumulate&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What raw XMLReader is still excellent for
&lt;/h2&gt;

&lt;p&gt;It is worth being very clear here: this is not an argument against &lt;code&gt;XMLReader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;XMLReader&lt;/code&gt; is the right foundation for large XML handling in PHP.&lt;/p&gt;

&lt;p&gt;And there are cases where staying close to the metal is still the best option:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the task is small and one-off;&lt;/li&gt;
&lt;li&gt;you need very custom cursor-level logic;&lt;/li&gt;
&lt;li&gt;the extraction rules are extremely specific;&lt;/li&gt;
&lt;li&gt;introducing another abstraction would not pay for itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When that is the case, use raw &lt;code&gt;XMLReader&lt;/code&gt; and move on.&lt;/p&gt;

&lt;p&gt;That is a completely reasonable engineering choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where XmlExtractKit starts paying off
&lt;/h2&gt;

&lt;p&gt;A focused extraction toolkit starts making sense when the job repeats.&lt;/p&gt;

&lt;p&gt;That usually means one or more of these are true:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XML files are large enough that streaming is mandatory;&lt;/li&gt;
&lt;li&gt;extraction is a recurring integration pattern;&lt;/li&gt;
&lt;li&gt;the codebase needs arrays, not XML trees;&lt;/li&gt;
&lt;li&gt;multiple projects solve similar feed or import tasks;&lt;/li&gt;
&lt;li&gt;you want a stable intermediate representation of XML records;&lt;/li&gt;
&lt;li&gt;you want the extraction code to read like the task, not like cursor 
choreography.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the use case I built &lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt; for.&lt;/p&gt;

&lt;p&gt;I did not want a general-purpose XML mega-toolkit. I wanted a practical way to keep the memory-safe streaming model while reducing how much extraction glue code I had to keep rewriting.&lt;/p&gt;

&lt;h2&gt;
  
  
  A more honest way to compare XML tools
&lt;/h2&gt;

&lt;p&gt;One of the reasons XML discussions become unhelpful is that people compare tools that are not aimed at the same job.&lt;/p&gt;

&lt;p&gt;A better comparison framework looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DOM / SimpleXML&lt;/strong&gt; when the document is small and full-tree 
convenience matters;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;raw XMLReader&lt;/strong&gt; when the file is large and the task is custom 
enough that low-level control is worth it;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XmlExtractKit&lt;/strong&gt; when the file is large, the task is 
extraction-focused, and you want structured arrays instead of 
repeated cursor glue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much more useful than asking for a universal winner.&lt;/p&gt;

&lt;p&gt;There is no universal winner.&lt;/p&gt;

&lt;p&gt;There is only a better fit for the task in front of you.&lt;/p&gt;

&lt;h2&gt;
  
  
  So which one should you choose?
&lt;/h2&gt;

&lt;p&gt;Here is my practical answer.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;raw &lt;code&gt;XMLReader&lt;/code&gt;&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you want maximal control;&lt;/li&gt;
&lt;li&gt;the task is narrow;&lt;/li&gt;
&lt;li&gt;the extraction code will probably never be reused;&lt;/li&gt;
&lt;li&gt;a little extra boilerplate is acceptable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose &lt;strong&gt;XmlExtractKit&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you keep solving the same extraction problem repeatedly;&lt;/li&gt;
&lt;li&gt;you want the XML stage to produce structured PHP arrays;&lt;/li&gt;
&lt;li&gt;you want extraction code that is easier to read and maintain;&lt;/li&gt;
&lt;li&gt;you want to stay streaming-first without hand-writing the same 
conversion patterns again and again.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;For real XML extraction tasks in PHP, the main decision is usually not "which XML package is best?"&lt;/p&gt;

&lt;p&gt;It is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I want to keep solving this at the raw XMLReader level, or do I want a reusable extraction-oriented layer on top of the same streaming model?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the honest comparison.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;XMLReader&lt;/code&gt; is still the correct low-level foundation for large XML files.&lt;/p&gt;

&lt;p&gt;But if your actual problem is repeated extraction of business records into plain PHP arrays, then &lt;strong&gt;XmlExtractKit&lt;/strong&gt; (&lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;) is designed to make that workflow cleaner, more reusable, and easier to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore the demo project
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xml-extract-kit-demo-repo
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>xml</category>
      <category>xmlreader</category>
      <category>etl</category>
      <category>integration</category>
    </item>
    <item>
      <title>How to Parse Large XML Files in PHP Without Running Out of Memory</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Sun, 12 Apr 2026 20:01:05 +0000</pubDate>
      <link>https://forem.com/sbwerewolf/how-to-parse-large-xml-files-in-php-without-running-out-of-memory-234o</link>
      <guid>https://forem.com/sbwerewolf/how-to-parse-large-xml-files-in-php-without-running-out-of-memory-234o</guid>
      <description>&lt;p&gt;XML is still everywhere: supplier feeds, marketplace catalogs, partner exports, legacy APIs, SOAP-ish payloads, ETL jobs. None of that is glamorous, but plenty of production systems still depend on it.&lt;/p&gt;

&lt;p&gt;The real problem starts when the file is no longer small.&lt;/p&gt;

&lt;p&gt;At that point, the question is not really &lt;strong&gt;"How do I parse XML in PHP?"&lt;/strong&gt; It becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I process a large XML document safely, extract only the records I care about, and keep the rest of my application working with normal PHP data structures?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is a very different problem.&lt;/p&gt;

&lt;p&gt;In many real-world integrations, you do not need the whole XML document in memory. You do not need to traverse every branch of the tree. You do not need a rich DOM-style model.&lt;/p&gt;

&lt;p&gt;You usually need something much simpler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scan the file efficiently;&lt;/li&gt;
&lt;li&gt;find repeated business records such as &lt;code&gt;product&lt;/code&gt;, &lt;code&gt;offer&lt;/code&gt;, or &lt;code&gt;item&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;extract those records;&lt;/li&gt;
&lt;li&gt;turn them into arrays;&lt;/li&gt;
&lt;li&gt;pass them to the rest of your pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the approach I use in modern PHP projects, and it is the one I recommend for large XML workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why naive XML parsing stops working
&lt;/h2&gt;

&lt;p&gt;For small files, the usual PHP XML tools are perfectly fine.&lt;/p&gt;

&lt;p&gt;A typical first solution looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;simplexml_load_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// process product&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is nothing wrong with that when the file is small and the document structure is simple.&lt;/p&gt;

&lt;p&gt;The trouble is that this style of code implicitly treats the XML file as something you want to load and work with as a whole. For large feeds, that is often the wrong tradeoff.&lt;/p&gt;

&lt;p&gt;If you only need repeated business records from a large XML file, materializing the entire document in memory is unnecessary work. It also makes your pipeline more fragile as feeds grow over time.&lt;/p&gt;

&lt;p&gt;This is why large-XML handling should start with a different mental model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not load the document. Stream through it and extract only what matters.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The real task is usually extraction, not XML manipulation
&lt;/h2&gt;

&lt;p&gt;In practice, most XML processing jobs in application code look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the file contains many repeated records;&lt;/li&gt;
&lt;li&gt;you only need a subset of them;&lt;/li&gt;
&lt;li&gt;you only need some fields from each record;&lt;/li&gt;
&lt;li&gt;the result will end up in arrays, JSON, a database, or a queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the business task is usually not "work with XML as a document."&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find the repeated records I care about and turn them into application-friendly data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That distinction matters because it leads directly to the right low-memory approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  The memory-safe foundation: XMLReader
&lt;/h2&gt;

&lt;p&gt;In PHP, the standard low-level tool for memory-safe XML traversal is &lt;code&gt;XMLReader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Instead of loading the entire document, it lets you move through the XML cursor-style, node by node.&lt;/p&gt;

&lt;p&gt;That is exactly what you want when the file is large.&lt;/p&gt;

&lt;p&gt;Here is a minimal baseline example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'product'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$nodeXml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readOuterXML&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="nv"&gt;$product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;simplexml_load_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$nodeXml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;

        &lt;span class="c1"&gt;// process $data immediately&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is already much better than loading the full file up front.&lt;/p&gt;

&lt;p&gt;It gives you the right execution model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sequential reading;&lt;/li&gt;
&lt;li&gt;low memory pressure;&lt;/li&gt;
&lt;li&gt;immediate processing of extracted records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your XML task is simple and one-off, this may be enough.&lt;/p&gt;

&lt;p&gt;But once you do this in more than one project, the weak points show up quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where raw XMLReader starts to hurt
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;XMLReader&lt;/code&gt; is powerful, but it is also low-level.&lt;/p&gt;

&lt;p&gt;The moment your extraction task becomes slightly more realistic, you start accumulating glue code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated node-selection logic;&lt;/li&gt;
&lt;li&gt;conversion of XML fragments into arrays;&lt;/li&gt;
&lt;li&gt;nested element handling;&lt;/li&gt;
&lt;li&gt;attributes versus values;&lt;/li&gt;
&lt;li&gt;optional nodes;&lt;/li&gt;
&lt;li&gt;repeated fields like multiple &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; tags;&lt;/li&gt;
&lt;li&gt;serialization to JSON-friendly structures;&lt;/li&gt;
&lt;li&gt;duplicated extraction code across projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, memory is no longer the only concern.&lt;/p&gt;

&lt;p&gt;Maintainability becomes the real cost.&lt;/p&gt;

&lt;p&gt;This is the line I care about most in application code: not just "can I stream it," but "can I keep the extraction logic readable after the third similar integration?"&lt;/p&gt;

&lt;h2&gt;
  
  
  A more practical extraction-first approach
&lt;/h2&gt;

&lt;p&gt;This is exactly why I built &lt;strong&gt;XmlExtractKit&lt;/strong&gt; for PHP, published as &lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The goal is not to replace &lt;code&gt;XMLReader&lt;/code&gt;, but to keep its streaming model while moving application code closer to the actual business task.&lt;/p&gt;

&lt;p&gt;Instead of managing the cursor manually and assembling records by hand, I want code that says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open a large XML stream;&lt;/li&gt;
&lt;li&gt;match the elements I care about;&lt;/li&gt;
&lt;li&gt;get plain PHP arrays back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a streaming example using the library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tempnam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sys_get_temp_dir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'xml-extract-kit-'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nb"&gt;file_put_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;catalog&amp;gt;
  &amp;lt;offer id="1001" available="true"&amp;gt;
    &amp;lt;name&amp;gt;Keyboard&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;49.90&amp;lt;/price&amp;gt;
  &amp;lt;/offer&amp;gt;
  &amp;lt;service id="s-1"&amp;gt;
    &amp;lt;name&amp;gt;Warranty&amp;lt;/name&amp;gt;
  &amp;lt;/service&amp;gt;
  &amp;lt;offer id="1002" available="false"&amp;gt;
    &amp;lt;name&amp;gt;Mouse&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;19.90&amp;lt;/price&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/catalog&amp;gt;
XML&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractPrettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="no"&gt;JSON_PRETTY_PRINT&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;JSON_UNESCAPED_SLASHES&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is application-friendly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"offer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Keyboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"49.90"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"offer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mouse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"19.90"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is still a streaming workflow. The difference is that the code is now centered on the extraction task instead of low-level cursor management.&lt;/p&gt;

&lt;p&gt;That becomes more valuable when the XML structure is nested, partially optional, or reused across multiple integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why plain arrays are often the right output
&lt;/h2&gt;

&lt;p&gt;A lot of application code does not really want XML.&lt;/p&gt;

&lt;p&gt;It wants data.&lt;/p&gt;

&lt;p&gt;Once the relevant record has been extracted, the rest of the system usually prefers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;plain arrays;&lt;/li&gt;
&lt;li&gt;normalized values;&lt;/li&gt;
&lt;li&gt;JSON-ready structures;&lt;/li&gt;
&lt;li&gt;data that can be validated, transformed, and persisted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why I think "XML extraction" is a more useful framing than "XML handling."&lt;/p&gt;

&lt;p&gt;Most business systems do not want to live inside an XML tree. They want to move past it as quickly as possible.&lt;/p&gt;

&lt;p&gt;If the XML document is just a transport format, then the best workflow is usually:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XML stream -&amp;gt; selected nodes -&amp;gt; PHP arrays&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the design center of my library.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this approach makes sense
&lt;/h2&gt;

&lt;p&gt;This style of XML processing works especially well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML file is large;&lt;/li&gt;
&lt;li&gt;the document contains many repeated records;&lt;/li&gt;
&lt;li&gt;you only need part of the document;&lt;/li&gt;
&lt;li&gt;the extracted data should be processed immediately;&lt;/li&gt;
&lt;li&gt;the rest of the application works with arrays, not DOM objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier and marketplace feeds;&lt;/li&gt;
&lt;li&gt;product catalogs;&lt;/li&gt;
&lt;li&gt;partner imports and exports;&lt;/li&gt;
&lt;li&gt;ETL jobs;&lt;/li&gt;
&lt;li&gt;queue payload preparation;&lt;/li&gt;
&lt;li&gt;legacy integration endpoints that still speak XML.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When you probably do not need it
&lt;/h2&gt;

&lt;p&gt;There are also cases where this is the wrong tool.&lt;/p&gt;

&lt;p&gt;You probably do not need a streaming extraction approach when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML is small;&lt;/li&gt;
&lt;li&gt;loading the whole file is acceptable;&lt;/li&gt;
&lt;li&gt;you need full-document manipulation;&lt;/li&gt;
&lt;li&gt;your task is closer to DOM transformation than record extraction;&lt;/li&gt;
&lt;li&gt;the XML structure is simple enough that a tiny one-off script is &lt;/li&gt;
&lt;li&gt;enough.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is important to say explicitly.&lt;/p&gt;

&lt;p&gt;Not every XML task needs an extraction-first workflow. But the ones that do usually benefit from it immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful rule of thumb
&lt;/h2&gt;

&lt;p&gt;Here is the simplest practical rule I know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;if the XML is small and you need the whole document, convenience 
APIs are fine;&lt;/li&gt;
&lt;li&gt;if the XML is large and you only need repeated records, stream it;&lt;/li&gt;
&lt;li&gt;if you keep solving the same streaming extraction problem in multiple projects, stop writing the same glue code over and over.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the point where a focused library becomes worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Large XML files are not primarily a parsing problem.&lt;/p&gt;

&lt;p&gt;They are an extraction problem.&lt;/p&gt;

&lt;p&gt;If you treat them like full in-memory documents, you often pay too much in memory and complexity. If you treat them like streams of repeated business records, the solution becomes safer, simpler, and much easier to fit into modern PHP pipelines.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;XMLReader&lt;/code&gt; gives you the right low-level foundation for that model.&lt;/p&gt;

&lt;p&gt;And if your real task is not "load XML," but "extract matching records and turn them into plain PHP arrays," then &lt;strong&gt;XmlExtractKit&lt;/strong&gt; (&lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;) was built exactly for that workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore the demo project
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xml-extract-kit-demo-repo
composer &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>opensource</category>
      <category>php</category>
      <category>xml</category>
      <category>parsing</category>
    </item>
  </channel>
</rss>
