How do I extract the contents of all the paragraphs tags only in xml file

Posted on 2007-10-13
Medium Priority
Last Modified: 2010-04-15
I would like to parse, if possible, the contents (all the words) out of the paragraph tags in an XML file. I would then like to tokenize these words, eliminated duplicates and store the words into an array.
Using generic C libraries (string.h, ctype.h, stdio.h) if possible
Your help is greatly appreciated
Question by:mateo6281
LVL 24

Accepted Solution

fridom earned 1500 total points
ID: 20073102
Very funny not satisfied with the use of libraries?
So fine than do it the hard way
Opening a file can be done with fopen
reading with fgets
finding a string in another one with strstr
So combining them gives:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

char *words_in_paragraph[1000] = {0};
char lines_in_paragraph[100][2048];

int main (void) {
  char *file_name = "test_data.xml";
  char buf[2048];
  char *pc, *begin_tag_found, *end_tag_found;
  char *tag_to_find = "<p>";
  char *end_tag_to_find = "</p>";
  int in_tag_to_find, i;
  size_t line_i_m_one_len;
  FILE *in;

  in = fopen(file_name, "r");
  while((pc = fgets(buf, 2048, in)) != NULL) {
    begin_tag_found = strstr(buf, tag_to_find);
    if (NULL == begin_tag_found){
    } else {
      in_tag_to_find = 1;
      for (i = 0;  i < 100 && in_tag_to_find; i++) {
        strcpy(lines_in_paragraph[i], buf);
        end_tag_found = strstr(buf, end_tag_to_find);
        if (NULL != end_tag_found) {
          in_tag_to_find = 0;
        } else {
          pc = fgets(buf, 2048, in);
          if (NULL == pc) {
            in_tag_to_find = 0;
      assert(i < 100);
      lines_in_paragraph[i-1][0] = '\0';
      /* remove the <p> from the lines */
      strcpy(buf, lines_in_paragraph[0]);
      strcpy(lines_in_paragraph[0], buf + strlen(tag_to_find));
      /* remove the </p> from the end */
      line_i_m_one_len = strlen(lines_in_paragraph[i-1]);
      lines_in_paragraph[i-1][line_i_m_one_len - strlen(end_tag_to_find)-1] = '\0';
      for (i = 0; i < 100 && lines_in_paragraph[i][0] != '\0'; i++) {
        printf("lines[%d] = %s\n", i, lines_in_paragraph[i]);
        memset(lines_in_paragraph[i], 2048, 0); /* clear the entries */
  return 0;

There's nearly no error handling in it but I'm sure you can fix that
Output is:
lines[0] = I'm enclose in p-tags
lines[0] = I'm enclosed too
lines[0] = over a few lines

lines[1] = another line

That's ok because I have not removed the traling \n Now you have the lines of the paragraph in

Splitting them into tokens can no be doen with strtok or some similear means.

I think it's now your turn to fix the problems in the above code and add a few lines of code yourself


Author Comment

ID: 20073303
Thanks a lot.......... greatly appreciated

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question