Solved

Fetching data from a webpage and inserting to excel with Java

Posted on 2013-10-28
9
353 Views
Last Modified: 2013-12-18
Hello,

I need to fetch data from the following url and insert it into excel.

http://apps.who.int/classifications/icd10/browse/2010/en

Manual entry of data is tedious by copy pasting. Was wondering, if I could accomplish the same using a simple java program.

Please find attached herewith a sample excel file. Would appreciate if someone could guide me in the correct direction as to how to proceed with this. The data need to be taken in the manner as shown in the excel.

Any help which could help me accomplish this perfectly the way in which I have demonstrated in excel, would be much appreciated with points surely.

Any queries, please revert.

Best Regards,
Anees
sample-data.xlsx
0
Comment
Question by:Sreejith22
  • 5
  • 3
9 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39604946
The ICD-10 code set contains from 14,400 to 76,000 codes with descriptive text.  http://en.wikipedia.org/wiki/ICD-10   Putting them all in one spreadsheet is going to make the spreadsheet slow to load and cumbersome to use.  In addition, your sample spreadsheet did not include the important descriptive text that tells what the code refers to.

While I'm sure that there is probably a way to capture all that info, you might want to think about how you want to use it.  I would just download that entire web site for local use.  Or maybe even just use it as it is on the web.  It seems to have a pretty good search function that would be better than what you would find in Excel.
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 39604976
0
 

Author Comment

by:Sreejith22
ID: 39605422
I followed the link CEHJ provided and it brings me one step closer to what I want.

@DaveBaldwin - For some reason, I need to get this data inside excel badly. With just the three columns I showed and nothing extra I need.

I have attached here the xml which I downloaded from the link which CEHJ gave.

<Class code="A00-A09" kind="block">
		<SuperClass code="I"/>
		<SubClass code="A00"/>
		<SubClass code="A01"/>
		<SubClass code="A02"/>
		<SubClass code="A03"/>
		<SubClass code="A04"/>
		<SubClass code="A05"/>
		<SubClass code="A06"/>
		<SubClass code="A07"/>
		<SubClass code="A08"/>
		<SubClass code="A09"/>
		<Rubric id="D0000002" kind="preferred">
			<Label xml:lang="en" xml:space="default">Intestinal infectious diseases</Label>
		</Rubric>
	</Class>
	<Class code="A00" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00-A09"/>
		<SubClass code="A00.0"/>
		<SubClass code="A00.1"/>
		<SubClass code="A00.9"/>
		<Rubric id="D0000003" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera</Label>
		</Rubric>
	</Class>
	<Class code="A00.0" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00"/>
		<Rubric id="D0000004" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera due to Vibrio cholerae 01, biovar cholerae</Label>
		</Rubric>
		<Rubric id="D0000934" kind="inclusion">
			<Label xml:lang="en" xml:space="default">Classical cholera</Label>
		</Rubric>
	</Class>
	<Class code="A00.1" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00"/>
		<Rubric id="D0000005" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera due to Vibrio cholerae 01, biovar eltor</Label>
		</Rubric>
		<Rubric id="D0000935" kind="inclusion">
			<Label xml:lang="en" xml:space="default">Cholera eltor</Label>
		</Rubric>
	</Class>
	<Class code="A00.9" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00"/>
		<Rubric id="D0000006" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera, unspecified</Label>
		</Rubric>
	</Class>

Open in new window


The above given code is an excerpt from the attached xml. From the attached xml, can someone let me know how I can fetch the the three data as shown in the excel file attached in the question.

Any help is appreciated.
icd102010en.xml
0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 
LVL 86

Expert Comment

by:CEHJ
ID: 39605454
You can't really. A spreadsheet is not for presenting hierarchical data - it's for presenting flat data. If you want to force it into a spreadsheet, then you need to force the hierarchy flat - and then preferably use csv
0
 

Author Comment

by:Sreejith22
ID: 39605478
I do not need to have a hierarchical representation of data in the excel file. Just a code and the corresponding full name, classification and sub-classification.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39605596
You'll need to parse the ClaML  xml. And you're going to need the DTD:

http://clam.eggbird.eu/download/EN14463-2007.dtd
0
 

Author Comment

by:Sreejith22
ID: 39608151
can someone please guide me on this with some code. I have no clues as to how to proceed.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39608156
I can't now as i haven't the time. I suggest you look at general xml parsing tutorials. If you want code producing for you on a professional basis, you're welcome to contact me via my profile
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39726407
:)
0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question