Solved

load XML file iteration

Posted on 2013-05-28
5
510 Views
Last Modified: 2013-05-30
Hi,
I have an xml File (see attached file)
This file is composed of Blocks, Lines, words, characters:

Every Block  is composed of 1,...,n Lines
Every line is composed of 1,...,k  words
Every word is composed of 1,...,l  characters

I am trying to create objects as follows:
Block(Int top, Int left, Int bottom, int right, vector<Lines>)
Line(Int top, Int left, Int bottom, int right, vector<words>)
Word(Int top, Int left, Int bottom, int right, vector<characters>)



I am using TinyXML on C++, but i can't link them together, My code can take one object( block,line,word,character) at a time.

void Keywords::checkChild(TiXmlElement *child)
{
       if(child)
        {

            if((string)child->Value() == "block")
            {
                cout << child->Value()<<endl;

                double x1 = atoi(child->Attribute("left"));
                double y1 = atoi(child->Attribute("top"));
                double x2 = atoi(child->Attribute("right"));
                double y2=  atoi(child->Attribute("bottom"));
              //Vector<Line>lineList
              //  blockList.push_back(newBlock(y1,x1,y2,x2,lineList));
            }


          checkChild(child->FirstChildElement());
          
          checkChild(child->NextSiblingElement());

        }///end if child
}

Open in new window


Thank you.
00000012-1-R.xml
0
Comment
Question by:HaniDaher
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
5 Comments
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39204521
You need to have a different function for each type (or if they all have the same attributes you could use templates).

Something like
void Keywords::checkBlock(TiXmlElement *child)
{
       if(child)
        {

            if((string)child->Value() == "block")
            {
                cout << child->Value()<<endl;

                double x1 = atoi(child->Attribute("left"));
                double y1 = atoi(child->Attribute("top"));
                double x2 = atoi(child->Attribute("right"));
                double y2=  atoi(child->Attribute("bottom"));
                blockList.push_back(newBlock(y1,x1,y2,x2);
            }

          child = child->FirstChildElement();
          while(child)
          {
              getLine(child, blockList.Back());
              child = child->NextSiblingElement();
           }
        }///end if child
}
[code]
void Keywords::checkLine(TiXmlElement *child, Block* block)
{
       if(child)
        {

            if((string)child->Value() == "line")
            {
                cout << child->Value()<<endl;

                double m = atoi(child->Attribute("slope")); //or whatever
                double x0 = atoi(child->Attribute("intercept"));
                block->m_line_list.push_back(newLine(m,x0));
            }

          child = child->FirstChildElement();
          while(child)
          {
              getWord(child, block->m_line_list.Back());
              child = child->NextSiblingElement();
           }
        }///end if child
}
      

Open in new window

0
 

Author Comment

by:HaniDaher
ID: 39204588
Yes Tommy that's what i thought. I actually managed to find the following solution:
void parseFile(TiXmlElement* document, vector<Block*>& blocks)
{
  for (TiXmlElement* sub = document->GetFirstChildElement("block"); sub; sub = sub->GetNextSiblingElement("block"))
    blocks.push_back(parseBlock(sub));
}
Block* parseBlock(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Line*> lines;
  for (TiXmlElement* sub = element->GetFirstChildElement("line"); sub; sub = sub->GetNextSiblingElement("line"))
    lines.push_back(parseLine(sub));
  return new Block(x1, ..., lines);
}
Line* parseLine(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Word*> words;
  for (TiXmlElement* sub = element->GetFirstChildElement("word"); sub; sub = sub->GetNextSiblingElement("word"))
    words.push_back(parseWord(sub));
  return new Line(x1, ..., words);
}
Word* parseWord(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Char*> chars;
  for (TiXmlElement* sub = element->GetFirstChildElement("char"); sub; sub = sub->GetNextSiblingElement("char"))
    chars.push_back(parseChar(sub));
  return new Word(x1, ..., chars);
}
Char* parseChar(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  return new Char(x1, ...);
}

Open in new window


I think it is basically the same idea as yours.
What do you think about the above code?
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 250 total points
ID: 39204642
Yes, that is the same basic idea. Looks like it would work. Personally, I would try to avoid all those calls to new so you don't have to worry about cleaning up all the memory later.
Something like this
void parseFile(TiXmlElement* document, vector<Block>& blocks)
{
  for (TiXmlElement* sub = document->GetFirstChildElement("block"); sub; sub = sub->GetNextSiblingElement("block"))
  {
    
    blocks.push_back(Block());
    parseBlock(sub, blocks.back())
  }
}
void parseBlock(TiXmlElement* element, Block* block)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  for (TiXmlElement* sub = element->GetFirstChildElement("line"); sub; sub = sub->GetNextSiblingElement("line"))
  {
    block->m_lines.push_back(Line);
    parseLine(sub, block->m_lines.back());
   } 

//etc
}

Open in new window


Either way works. I've just found that using dynamic memory like that can lead to segfaults and memory leaking down the road (unless this is just a small one-off thing).
If you are building this as part of a larger program that other people may modify later, I would recommend only using new in a constructor or in a function that also has the delete.
0
 
LVL 34

Accepted Solution

by:
sarabande earned 250 total points
ID: 39205123
if using vector<Block> instead of vector<Block*> the parseBlock function needs to get the second argument by reference and not by pointer:

...
   parseBlock(sub, blocks.back()); // the blocks.back returns a reference to the new Block
  }
}
void parseBlock(TiXmlElement* element, Block& block)

Open in new window


nevertheless as you already have a class 'Keywords' there is no need to turn to c function style. all the objects Block, Line, Word, Character share the same attributes of a rectangle. hence the following class tree seems to map:

struct Rectangle
{
    int left;
    int top;
    int right;
    int bottom;
    Rectangle() : left(0), top(0), right(0), bottom(0) { }
    Rectangle(int l, int t, int r, int b) : left(l), top(t), right(r), bottom(b) { }
};

class Base
{
    int id;
    Rectangle rect;
    std::vector<Base*> subs;
public:
    virtual ~Base()
    { while subs.empty() == false) { delete subs[0]; subs.erase(subs.begin(); } }
    void setRectangle(TiXmlElement* obj);
    virtual Base * createSub();
    virtual std::string getSubName();
    bool parseSubs(const std::string & keyword, TiXmlElement* obj);
};

class Block : public Base
{
...
    Base * createSub() { return new Line; }
    std::string getSubName() { return "line"; }
};

...

class Word : public Base
{
    std::string value;
    int confidence;
    std::string font;
    int type;
public:
    ...
    Base * createSub() { return new Character; }
    std::string getSubName() { return "character"; }
    
};

Open in new window


if doing so you could use the Base container std::vector<Base*> subs as container for lines, words, characters and implement the function parseSubs such that it works for all 4 class objects. you would create new pointers of the 'sub' class by calling the virtual function createSub.

note, the pointers in the containers were deleted when the Base object was destructed. so no need to worry for leaks.

Sara
0

Featured Post

MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Windows Script Host (WSH) has been part of Windows since Windows NT4. Windows Script Host provides architecture for building dynamic scripts that consist of a core object model, scripting hosts, and scripting engines. The key components of Window…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
The viewer will learn how to implement Singleton Design Pattern in Java.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question