Solved

Can I add hundreds of articles to Wordpress?

Posted on 2013-11-18
7
369 Views
Last Modified: 2013-11-20
I have many hundreds of articles in MS Word format going back ~24 years. The formatting is simple (mostly just text; some hyperlinks) and I have created some VBA code to convert them to clean HTML. Most convert fully, and prompts can enter publication date, keywords, categories, etc. The process works fine for creating simple HTML web pages.

I would like to use WordPress to manage the articles. I assume I would need to modify my VBA a bit to create different HTML-like files, but I'm unclear about how WordPress would deal with a folder full of structured files to present them as if they were dated blog posts with categories -- and allow a reader to search on the content.

When I examine the source code for sample themes and blogs I like, the content seems to be wrapped within the WordPress theme code but are the individual posts are being assembled from a database?

What do I need with each article to make it conform to whatever standards WordPress needs for including it? Are there resources showing this?

Perhaps more importantly, am I on the right track here or do I need something more complicated?
0
Comment
Question by:EricFletcher
  • 3
  • 3
7 Comments
 
LVL 70

Expert Comment

by:Jason C. Levine
ID: 39658189
code but are the individual posts are being assembled from a database?

Yes.  The best way to do this is to import the articles using a CSV file and a plugin:

http://wordpress.org/plugins/wp-ultimate-csv-importer/

If you have the HTML, you are more than halfway there.  You need to remove all of the line breaks for optimum importing and add titles and other meta-info (categories, tags, etc) and then you would be good to go.

Alternately, just copy/paste the HTML and create new posts by hand.
0
 
LVL 21

Author Comment

by:EricFletcher
ID: 39659237
Copy & paste the HTML isn't really an option because of the volume and the ongoing need.

The files I want to import are text files, so is a CSV import really suitable? I could remove the line breaks, but that would make each article one very long string.

What I was hoping for is a way to structure the individual articles with whatever WordPress would need to process them as if they had been entered as individual posts. If WP is assembling from a database, wouldn't I be able to just add each article as another "record" in the database? If that is possible, could an ftp tool then be used to move the files to the suitable location for WP?

My objective is to get all of the content online in a way that someone can search on all of the text (all articles), but also to be able to view by date or category. Blog structures seems to do this, and several WP themes seem to have the add-on tools to support what we need.

As noted, most are quite simple HTML. This example from 2002 is typical: <meta>, <title>, <body>, <h2> and some <p> elements. More recent ones have additional meta data added for keywords.

The method I set up many years ago worked okay, but it was becoming hard to manage because I needed to add to the index each time. More troubling, with all of the articles saved on a web site, we found that they were being "mined" and sold to students. A database-driven method might not eliminate this, but it should make it more complicated.
0
 
LVL 70

Expert Comment

by:Jason C. Levine
ID: 39659533
Yes, CSV is still the best way. Because everything has to get into the database, FTP is not an option as WordPress has no ability to read text files from a directory.  So yes, you need to add each article as a record in the content database, but the way to do that is as an import. Your title tag will become the title of the post and the contents of the body tag (minus the actual body tag) becomes the post body. Your meta tags will be largely useless in a CMS and can be removed.

WordPress (and all Content Management Systems) don't create individual files on the web server.  Instead, there is one template file that gets populated dynamically by calls to the database.  The end user doesn't see a difference but you need to flatten your files into a single import for adding to the database. You could create a static web site from your individual files but that would require more work on your end to create the design, navigation elements, etc.

Converting to a CMS isn't going to change the data mining you describe unless you are planning to password-protect the posts.  People can still copy-paste the text.

The only other alternative would be to import your original Word files as WordPress media attachments but this is a pretty poor solution as you would lose out on most of the advantages of using WordPress.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 21

Author Comment

by:EricFletcher
ID: 39659875
Okay, I think I'm starting to get my head around this now! I'd need to structure my articles to fit a CSV format so the import plug-in you've recommended could manage the bulk loading.

I looked through the plug-in description and can see that they map the various elements of a WordPress posting to CSV 'fields'. So does that mean that their plug-in would be expecting data with comma-separated quoted entries with at minimum the post_title (from my current <title> content), post-content (as a single "line" of all the <p> and other html elements that make up the article body), and a post_date (which I can derive from the publication date)? I can see how I could alter my VBA code to add other elements during processing (i.e. for an excerpt, categories, tags, etc.) but will I need to have an entry for each one of the elements of a WordPress post?

I am still a bit unsure about how the text of an article would be converted to a valid CSV format. Most articles are typically ~1K words, so this would mean that the post_content part of a record would be ~5,000 characters: is there any limitation to the length of a given field within a CSV record?

As well, I gather from questions in their support forums that the importer expects comma- or semicolon-delimited fields without embedded quotes. Many of our articles use quotes in the body. These are almost always the typographic symbols “ ” ‘ and ’ instead of " and '. My VBA routine could change them to codes like &#8220; (for the “ for example) and change any that should be kept as " and ' (typically only for lat/long references) to &#34; and &#39;.

However, I am unsure about how I would be able to include the required quote marks within any embedded HTML. When I tested how an article might be presented in CSV by parsing it's content to cells in an Excel sheet, an <a href="mailto: element within the post_content column was converted to <a href=""mailto: (with double quotes) in the CSV file version. I realize this is necessary for the CSV format, but will the plug-in importer resolve them back to valid HTML? (And BTW, the “ type symbols went through without creating a problem in the CSV records.)

I appreciate your assistance with this and have increased the points.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39660124
Your 'meta keywords' are not going to survive the import either.  Your <body> text is likely to be the only thing imported.
0
 
LVL 70

Accepted Solution

by:
Jason C. Levine earned 500 total points
ID: 39660517
I looked through the plug-in description and can see that they map the various elements of a WordPress posting to CSV 'fields'. So does that mean that their plug-in would be expecting data with comma-separated quoted entries with at minimum the post_title (from my current <title> content), post-content (as a single "line" of all the <p> and other html elements that make up the article body), and a post_date (which I can derive from the publication date)? I can see how I could alter my VBA code to add other elements during processing (i.e. for an excerpt, categories, tags, etc.) but will I need to have an entry for each one of the elements of a WordPress post?

Yes.  You are basically creating the flat-file version of the WordPress database as a CSV.  The plugin handles the import and also sends the various elements to the appropriate mysql tables.

importer expects comma- or semicolon-delimited fields without embedded quotes.

Yes, the Word "fancy quotes" will need to be converted to HTML entities or removed and replaced with their non-fancy equivalents (which is my usual preference -  I hate those damn things).

When I tested how an article might be presented in CSV by parsing it's content to cells in an Excel sheet, an <a href="mailto: element within the post_content column was converted to <a href=""mailto: (with double quotes) in the CSV file version. I realize this is necessary for the CSV format, but will the plug-in importer resolve them back to valid HTML?

It might, you will have to test.  There is an older CSV import plugin that I know recognized "" as an escaped " but I prefer to avoid the issue entirely and replace

href="mailto:someone@somewhere.com" 

Open in new window


with

href='mailto:someone@somewhere.com'

Open in new window


That allows me to have simple "," delimiters in the CSV.

I am still a bit unsure about how the text of an article would be converted to a valid CSV format. Most articles are typically ~1K words, so this would mean that the post_content part of a record would be ~5,000 characters: is there any limitation to the length of a given field within a CSV record?

No limit.  Yes, this will be a big CSV file :)

Every server handles script execution timeouts in a different way.  I would break up the CSV file into multiple files of 100-250 rows each and import them one at a time.
0
 
LVL 21

Author Comment

by:EricFletcher
ID: 39662639
Thanks for your detailed help Jason1178. It looks like my next step is to give it a try! I didn’t know that I could use the ' character as a delimiter in an HTML tag, so that will overcome the remaining issue I was anticipating with the CSV format.

I can tweak my VBA script to massage our existing HTML articles into the CSV format, so I’ll start with a few of them to see how it goes. More recent ones in Word-only format can be handled in batches to keep the CSV file sizes reasonable. For new articles, I think I will manage it by creating a script to convert the Word document into suitable HTML so it can just be pasted directly into WordPress rather than use an importer to batch them.

Oh, and about the quotes... The typographer in me abhors the use of foot and inch symbols used within text instead of the correct typographic quote marks—and so much so that I’m often unaware of my fingers typing the necessary Alt sequences in editors without autocorrect to manage it for me. However, when I write code, my normally-on autocorrect function is definitely a big hassle!
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

If you are looking for plug-ins to add functions to your WordPress small business web site, take some time to read though this comprehensive list.  These are all the plugins I use for my customers WordPress web sites, as well as my own.  Be sure to …
In order to have all security and back ups taken care of, WordPress users can sign up for services with WP Engine.
The purpose of this video is to demonstrate how to insert an Iframe into WordPress. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Open Page or Post…
The purpose of this video is to demonstrate how to add AdSense Ads to a WordPress Website, and how to set up WordPress to automatically place Ads in Sidebars. This will be demonstrated using a Windows 8 PC. Log into your AdSense account. : Cli…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now