• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 550
  • Last Modified:

load XML file iteration

Hi,
I have an xml File (see attached file)
This file is composed of Blocks, Lines, words, characters:

Every Block  is composed of 1,...,n Lines
Every line is composed of 1,...,k  words
Every word is composed of 1,...,l  characters

I am trying to create objects as follows:
Block(Int top, Int left, Int bottom, int right, vector<Lines>)
Line(Int top, Int left, Int bottom, int right, vector<words>)
Word(Int top, Int left, Int bottom, int right, vector<characters>)



I am using TinyXML on C++, but i can't link them together, My code can take one object( block,line,word,character) at a time.

void Keywords::checkChild(TiXmlElement *child)
{
       if(child)
        {

            if((string)child->Value() == "block")
            {
                cout << child->Value()<<endl;

                double x1 = atoi(child->Attribute("left"));
                double y1 = atoi(child->Attribute("top"));
                double x2 = atoi(child->Attribute("right"));
                double y2=  atoi(child->Attribute("bottom"));
              //Vector<Line>lineList
              //  blockList.push_back(newBlock(y1,x1,y2,x2,lineList));
            }


          checkChild(child->FirstChildElement());
          
          checkChild(child->NextSiblingElement());

        }///end if child
}

Open in new window


Thank you.
00000012-1-R.xml
0
HaniDaher
Asked:
HaniDaher
  • 2
2 Solutions
 
TommySzalapskiCommented:
You need to have a different function for each type (or if they all have the same attributes you could use templates).

Something like
void Keywords::checkBlock(TiXmlElement *child)
{
       if(child)
        {

            if((string)child->Value() == "block")
            {
                cout << child->Value()<<endl;

                double x1 = atoi(child->Attribute("left"));
                double y1 = atoi(child->Attribute("top"));
                double x2 = atoi(child->Attribute("right"));
                double y2=  atoi(child->Attribute("bottom"));
                blockList.push_back(newBlock(y1,x1,y2,x2);
            }

          child = child->FirstChildElement();
          while(child)
          {
              getLine(child, blockList.Back());
              child = child->NextSiblingElement();
           }
        }///end if child
}
[code]
void Keywords::checkLine(TiXmlElement *child, Block* block)
{
       if(child)
        {

            if((string)child->Value() == "line")
            {
                cout << child->Value()<<endl;

                double m = atoi(child->Attribute("slope")); //or whatever
                double x0 = atoi(child->Attribute("intercept"));
                block->m_line_list.push_back(newLine(m,x0));
            }

          child = child->FirstChildElement();
          while(child)
          {
              getWord(child, block->m_line_list.Back());
              child = child->NextSiblingElement();
           }
        }///end if child
}
      

Open in new window

0
 
HaniDaherAuthor Commented:
Yes Tommy that's what i thought. I actually managed to find the following solution:
void parseFile(TiXmlElement* document, vector<Block*>& blocks)
{
  for (TiXmlElement* sub = document->GetFirstChildElement("block"); sub; sub = sub->GetNextSiblingElement("block"))
    blocks.push_back(parseBlock(sub));
}
Block* parseBlock(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Line*> lines;
  for (TiXmlElement* sub = element->GetFirstChildElement("line"); sub; sub = sub->GetNextSiblingElement("line"))
    lines.push_back(parseLine(sub));
  return new Block(x1, ..., lines);
}
Line* parseLine(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Word*> words;
  for (TiXmlElement* sub = element->GetFirstChildElement("word"); sub; sub = sub->GetNextSiblingElement("word"))
    words.push_back(parseWord(sub));
  return new Line(x1, ..., words);
}
Word* parseWord(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Char*> chars;
  for (TiXmlElement* sub = element->GetFirstChildElement("char"); sub; sub = sub->GetNextSiblingElement("char"))
    chars.push_back(parseChar(sub));
  return new Word(x1, ..., chars);
}
Char* parseChar(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  return new Char(x1, ...);
}

Open in new window


I think it is basically the same idea as yours.
What do you think about the above code?
0
 
TommySzalapskiCommented:
Yes, that is the same basic idea. Looks like it would work. Personally, I would try to avoid all those calls to new so you don't have to worry about cleaning up all the memory later.
Something like this
void parseFile(TiXmlElement* document, vector<Block>& blocks)
{
  for (TiXmlElement* sub = document->GetFirstChildElement("block"); sub; sub = sub->GetNextSiblingElement("block"))
  {
    
    blocks.push_back(Block());
    parseBlock(sub, blocks.back())
  }
}
void parseBlock(TiXmlElement* element, Block* block)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  for (TiXmlElement* sub = element->GetFirstChildElement("line"); sub; sub = sub->GetNextSiblingElement("line"))
  {
    block->m_lines.push_back(Line);
    parseLine(sub, block->m_lines.back());
   } 

//etc
}

Open in new window


Either way works. I've just found that using dynamic memory like that can lead to segfaults and memory leaking down the road (unless this is just a small one-off thing).
If you are building this as part of a larger program that other people may modify later, I would recommend only using new in a constructor or in a function that also has the delete.
0
 
sarabandeCommented:
if using vector<Block> instead of vector<Block*> the parseBlock function needs to get the second argument by reference and not by pointer:

...
   parseBlock(sub, blocks.back()); // the blocks.back returns a reference to the new Block
  }
}
void parseBlock(TiXmlElement* element, Block& block)

Open in new window


nevertheless as you already have a class 'Keywords' there is no need to turn to c function style. all the objects Block, Line, Word, Character share the same attributes of a rectangle. hence the following class tree seems to map:

struct Rectangle
{
    int left;
    int top;
    int right;
    int bottom;
    Rectangle() : left(0), top(0), right(0), bottom(0) { }
    Rectangle(int l, int t, int r, int b) : left(l), top(t), right(r), bottom(b) { }
};

class Base
{
    int id;
    Rectangle rect;
    std::vector<Base*> subs;
public:
    virtual ~Base()
    { while subs.empty() == false) { delete subs[0]; subs.erase(subs.begin(); } }
    void setRectangle(TiXmlElement* obj);
    virtual Base * createSub();
    virtual std::string getSubName();
    bool parseSubs(const std::string & keyword, TiXmlElement* obj);
};

class Block : public Base
{
...
    Base * createSub() { return new Line; }
    std::string getSubName() { return "line"; }
};

...

class Word : public Base
{
    std::string value;
    int confidence;
    std::string font;
    int type;
public:
    ...
    Base * createSub() { return new Character; }
    std::string getSubName() { return "character"; }
    
};

Open in new window


if doing so you could use the Base container std::vector<Base*> subs as container for lines, words, characters and implement the function parseSubs such that it works for all 4 class objects. you would create new pointers of the 'sub' class by calling the virtual function createSub.

note, the pointers in the containers were deleted when the Base object was destructed. so no need to worry for leaks.

Sara
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now