How do I extract the contents of all the paragraphs tags only in xml file

Posted on 2007-10-13
Last Modified: 2010-04-15
I would like to parse, if possible, the contents (all the words) out of the paragraph tags in an XML file. I would then like to tokenize these words, eliminated duplicates and store the words into an array.
Using generic C libraries (string.h, ctype.h, stdio.h) if possible
Your help is greatly appreciated
Question by:mateo6281
    LVL 24

    Accepted Solution

    Very funny not satisfied with the use of libraries?
    So fine than do it the hard way
    Opening a file can be done with fopen
    reading with fgets
    finding a string in another one with strstr
    So combining them gives:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <assert.h>

    char *words_in_paragraph[1000] = {0};
    char lines_in_paragraph[100][2048];

    int main (void) {
      char *file_name = "test_data.xml";
      char buf[2048];
      char *pc, *begin_tag_found, *end_tag_found;
      char *tag_to_find = "<p>";
      char *end_tag_to_find = "</p>";
      int in_tag_to_find, i;
      size_t line_i_m_one_len;
      FILE *in;

      in = fopen(file_name, "r");
      while((pc = fgets(buf, 2048, in)) != NULL) {
        begin_tag_found = strstr(buf, tag_to_find);
        if (NULL == begin_tag_found){
        } else {
          in_tag_to_find = 1;
          for (i = 0;  i < 100 && in_tag_to_find; i++) {
            strcpy(lines_in_paragraph[i], buf);
            end_tag_found = strstr(buf, end_tag_to_find);
            if (NULL != end_tag_found) {
              in_tag_to_find = 0;
            } else {
              pc = fgets(buf, 2048, in);
              if (NULL == pc) {
                in_tag_to_find = 0;
          assert(i < 100);
          lines_in_paragraph[i-1][0] = '\0';
          /* remove the <p> from the lines */
          strcpy(buf, lines_in_paragraph[0]);
          strcpy(lines_in_paragraph[0], buf + strlen(tag_to_find));
          /* remove the </p> from the end */
          line_i_m_one_len = strlen(lines_in_paragraph[i-1]);
          lines_in_paragraph[i-1][line_i_m_one_len - strlen(end_tag_to_find)-1] = '\0';
          for (i = 0; i < 100 && lines_in_paragraph[i][0] != '\0'; i++) {
            printf("lines[%d] = %s\n", i, lines_in_paragraph[i]);
            memset(lines_in_paragraph[i], 2048, 0); /* clear the entries */
      return 0;

    There's nearly no error handling in it but I'm sure you can fix that
    Output is:
    lines[0] = I'm enclose in p-tags
    lines[0] = I'm enclosed too
    lines[0] = over a few lines

    lines[1] = another line

    That's ok because I have not removed the traling \n Now you have the lines of the paragraph in

    Splitting them into tokens can no be doen with strtok or some similear means.

    I think it's now your turn to fix the problems in the above code and add a few lines of code yourself


    Author Comment

    Thanks a lot.......... greatly appreciated

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Find Ransomware Secrets With All-Source Analysis

    Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

    An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
    Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode ( They will have you believe that Unicode requires you to use…
    The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
    Video by: Grant
    The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.

    758 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    10 Experts available now in Live!

    Get 1:1 Help Now