<div>
<div class="content wysiwyg-content">
Before doing a linear regression on a data-set one needs to find out a few things about the data:<br />
<br />

<ul>
<li>is there is a linear relation between the dependent and independent variables</li>
<li>are the variables normally distributed (if not, do a Data Transformation)</li>
</ul>
<br />
Questions:<br />
<br />

<ol type="1">
<li><a href="https://filedb.experts-exchange.com/incoming/2017/02_w07/1145520/Book1.xlsx" target="_blank" class="file-inline" title="Book1.xlsx (23 KB)"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M67.508 468.467c-58.005-58.013-58.016-151.92 0-209.943l225.011-225.04c44.643-44.645 117.279-44.645 161.92 0 44.743 44.749 44.753 117.186 0 161.944l-189.465 189.49c-31.41 31.413-82.518 31.412-113.926.001-31.479-31.482-31.49-82.453 0-113.944L311.51 110.491c4.687-4.687 12.286-4.687 16.972 0l16.967 16.971c4.685 4.686 4.685 12.283 0 16.969L184.983 304.917c-12.724 12.724-12.73 33.328 0 46.058 12.696 12.697 33.356 12.699 46.054-.001l189.465-189.489c25.987-25.989 25.994-68.06.001-94.056-25.931-25.934-68.119-25.932-94.049 0l-225.01 225.039c-39.249 39.252-39.258 102.795-.001 142.057 39.285 39.29 102.885 39.287 142.162-.028A739446.174 739446.174 0 0 1 439.497 238.49c4.686-4.687 12.282-4.684 16.969.004l16.967 16.971c4.685 4.686 4.689 12.279.004 16.965a755654.128 755654.128 0 0 0-195.881 195.996c-58.034 58.092-152.004 58.093-210.048.041z"/></svg>Book1.xlsx</a></li>
<li>In Excel, do I simply use the CORREL() function to determine a linear relation between the dependent and independent variables?</li>
<li>How do I interpret these results (i.e. low value = low correlation, high value = high correlation)?</li>
<li>To test normality, can I simply use the QQ Plot test or should I use other tests, like the K-S Test as well?</li>
<li>How do I interpret the K-S Test results?</li>
<li>How do I go about doing a &quot;Data Transformation&quot; and how do I know whether I should do it?</li>
<li>Please do not confuse things with only Statistics talk when giving an answer because I might not understand what you're saying. I've uploaded a sample of a data-set.</li>
</ol>
<br />
Thank you!
</div>
</div>


Book1.xlsx

Before doing a linear regression on a data-set one needs to find out a few things about the data:

is there is a linear relation between the dependent and independent variables
are the variables normally distributed (if not, do a Data Transformation)


Questions:

In Excel, do I simply use the CORREL() function to determine a linear relation between the dependent and independent variables?
How do I interpret these results (i.e. low value = low correlation, high value = high correlation)?
To test normality, can I simply use the QQ Plot test or should I use other tests, like the K-S Test as well?
How do I interpret the K-S Test results?
How do I go about doing a "Data Transformation" and how do I know whether I should do it?
Please do not confuse things with only Statistics talk when giving an answer because I might not understand what you're saying. I've uploaded a sample of a data-set.


Thank you!

Pre-Linear Regression Tests

The Math / Science topic primarily includes discussions of mathematics, physics, statistics and economic analysis, but also biology, chemistry and other sciences.

Math / Science

R is a programming language and environment used primarily for statistical data analysis.

data analysis

Statistics

Statistical packages are software titles, such as JMP and GNU Octave, and programming languages, such as MATLAB, R and SAS, that are used to discover, explore and analyze data and suggest useful conclusions, either to learn something unexpected or to confirm a hypothesis. The field includes the design and analysis of techniques to give approximate but accurate solutions to hard problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.