Solved

Fetching data from a webpage and inserting to excel with Java

Posted on 2013-10-28
9
344 Views
Last Modified: 2013-12-18
Hello,

I need to fetch data from the following url and insert it into excel.

http://apps.who.int/classifications/icd10/browse/2010/en

Manual entry of data is tedious by copy pasting. Was wondering, if I could accomplish the same using a simple java program.

Please find attached herewith a sample excel file. Would appreciate if someone could guide me in the correct direction as to how to proceed with this. The data need to be taken in the manner as shown in the excel.

Any help which could help me accomplish this perfectly the way in which I have demonstrated in excel, would be much appreciated with points surely.

Any queries, please revert.

Best Regards,
Anees
sample-data.xlsx
0
Comment
Question by:Sreejith22
  • 5
  • 3
9 Comments
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39604946
The ICD-10 code set contains from 14,400 to 76,000 codes with descriptive text.  http://en.wikipedia.org/wiki/ICD-10   Putting them all in one spreadsheet is going to make the spreadsheet slow to load and cumbersome to use.  In addition, your sample spreadsheet did not include the important descriptive text that tells what the code refers to.

While I'm sure that there is probably a way to capture all that info, you might want to think about how you want to use it.  I would just download that entire web site for local use.  Or maybe even just use it as it is on the web.  It seems to have a pretty good search function that would be better than what you would find in Excel.
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 39604976
0
 

Author Comment

by:Sreejith22
ID: 39605422
I followed the link CEHJ provided and it brings me one step closer to what I want.

@DaveBaldwin - For some reason, I need to get this data inside excel badly. With just the three columns I showed and nothing extra I need.

I have attached here the xml which I downloaded from the link which CEHJ gave.

<Class code="A00-A09" kind="block">
		<SuperClass code="I"/>
		<SubClass code="A00"/>
		<SubClass code="A01"/>
		<SubClass code="A02"/>
		<SubClass code="A03"/>
		<SubClass code="A04"/>
		<SubClass code="A05"/>
		<SubClass code="A06"/>
		<SubClass code="A07"/>
		<SubClass code="A08"/>
		<SubClass code="A09"/>
		<Rubric id="D0000002" kind="preferred">
			<Label xml:lang="en" xml:space="default">Intestinal infectious diseases</Label>
		</Rubric>
	</Class>
	<Class code="A00" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00-A09"/>
		<SubClass code="A00.0"/>
		<SubClass code="A00.1"/>
		<SubClass code="A00.9"/>
		<Rubric id="D0000003" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera</Label>
		</Rubric>
	</Class>
	<Class code="A00.0" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00"/>
		<Rubric id="D0000004" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera due to Vibrio cholerae 01, biovar cholerae</Label>
		</Rubric>
		<Rubric id="D0000934" kind="inclusion">
			<Label xml:lang="en" xml:space="default">Classical cholera</Label>
		</Rubric>
	</Class>
	<Class code="A00.1" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00"/>
		<Rubric id="D0000005" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera due to Vibrio cholerae 01, biovar eltor</Label>
		</Rubric>
		<Rubric id="D0000935" kind="inclusion">
			<Label xml:lang="en" xml:space="default">Cholera eltor</Label>
		</Rubric>
	</Class>
	<Class code="A00.9" kind="category">
		<Meta name="MortBCode" value="001"/>
		<Meta name="MortL4Code" value="4-002"/>
		<Meta name="MortL3Code" value="3-003"/>
		<Meta name="MortL2Code" value="2-001"/>
		<Meta name="MortL1Code" value="1-002"/>
		<SuperClass code="A00"/>
		<Rubric id="D0000006" kind="preferred">
			<Label xml:lang="en" xml:space="default">Cholera, unspecified</Label>
		</Rubric>
	</Class>

Open in new window


The above given code is an excerpt from the attached xml. From the attached xml, can someone let me know how I can fetch the the three data as shown in the excel file attached in the question.

Any help is appreciated.
icd102010en.xml
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39605454
You can't really. A spreadsheet is not for presenting hierarchical data - it's for presenting flat data. If you want to force it into a spreadsheet, then you need to force the hierarchy flat - and then preferably use csv
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:Sreejith22
ID: 39605478
I do not need to have a hierarchical representation of data in the excel file. Just a code and the corresponding full name, classification and sub-classification.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39605596
You'll need to parse the ClaML  xml. And you're going to need the DTD:

http://clam.eggbird.eu/download/EN14463-2007.dtd
0
 

Author Comment

by:Sreejith22
ID: 39608151
can someone please guide me on this with some code. I have no clues as to how to proceed.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39608156
I can't now as i haven't the time. I suggest you look at general xml parsing tutorials. If you want code producing for you on a professional basis, you're welcome to contact me via my profile
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39726407
:)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now