for `GPT.block0`, and [node_4] for `GPT.block2` respectfully.

    After the analysis step, a hierarchical representation is generated.

    For above example, the representation is:

        _ModuleNode(GPT)
            _ModuleNode(block0)
                _LeafNode(node_0)
            _ModuleNode(block1)
                _LeafNode(node_1)
                _ModuleNode(Attention1)
                    _ModuleNode(MLP)
                        _LeafNode(node_2)
                _LeafNode(node_3)
            _ModuleNode(block2)
                _LeafNode(node_4)

    Construction step
    -----------------

    The second step is to build the actual `call_module` node and the sub `fx.GraphModule`.
    This is done recursively from the leaf `_ModuleNode` to the root.

    For example, the first submodule to be built is `GPT.block1.Attention1.MLP`. Below pair
    is generated from `_ModuleNode(MLP)`.

        fx.GraphModule(GPT.block1.Attention1.MLP)
            graph:
                node_2

        new_mlp_node = `call_module[GPT.block1.Attention1.MLP](...)`

    Next, the `GPT.block1.Attention1` submodule is built. Below is generated from
    `_ModuleNode(Attention1)`.

        fx.GraphModule(GPT.block1.Attention1)
            graph:
                new_mlp_node
                node_3

        new_attention1_node = `call_module[GPT.block1.Attention1](...)`

    Until every submodule is built, the new modularized `fx.GraphModule` is generated.

    Alternatives
    ------------

    The current algorithm adopts a top down approach. A bottom up approach is similar.
    In contrast to these two, an alternative flat order approach is also possible, where
    each node is traversed and copied to the corresponding submodule.

    The advantage of the current approach lies in the encapsulation of the fx.GraphModule
    construction for each individual submodule within a single `build_module` method, which
    can be called separately once the analysis phase is completed, making debugging more
    convenient.

    Regarding construction step, an alternative implementation is to utilize `fx.Interpreter`
    for traversing all the nodes under the flattened root module and copying the nodes
    into their respective submodule under construction. This approach is not adopted because

        1. It uses the flat order approach discussed above. This means one cannot individually
    construct a submodule and examine it while debugging.

        2. The graph execution functionality of `fx.Interpreter` is not necessary for the
    purpose of this pass. Ignoring that, `fx.Interpreter.run` achieves the same effect
    as a for loop over all the nodes.
    r