I am building a scientific manuscript application that would allow authors to input their manuscript information online which would be stored into a microsoft SQL 2005 database. The data they input, title, authors, and manuscript information would then be used to display on the clients website via HTML as well as generating PDF's on the fly.
My issues are the following:
1. What is the best way to have the client input this data (with the main issue being the special foreign characters, scientific formulas etc)? Should I provide them with HTML text editor type boxes and just store this type of data as html?
2. I have thousands of existing PDF documents that I need to input into a database that are in this format. What would be the best suggestion for extracting this data to keep it in the same format.
I've attached a graphic sample of what one input would look like with the characters.