Wednesday, March 11, 2020

Data handling Essays

Data handling Essays Data handling Essay Data handling Essay I have chosen three categories, which are the height, hand span and the shoe size from the data. My first hypothesis is that taller people will havebigger feet. My second hypothesis is that taller people will have bigger hand spans. I think that there will be a positive correlation between the height with shoe and hand size because its common sense. Most people I see on the streets have big hands and feet when they are tall. I think that both correlations will be very high because of this.I picked a sample of 50 people because 50 out of about 261 pupils will be just about 20% of the people. It uses some of the people and 50 is a nice number to work with. I picked the sample of 50 by using stratified sampling since a random choice is not representative. The sample might contain all 50 boys or all 50 girls might it is random. Also, a stratified sample is more representative because it would be accurate by being based on the information, but it is not perfect.The data that I needed to s tratify are the date of births and the gender of the pupils because older pupils are normally taller.MonthsBoysGirlsTotal AmountSeptember11920October14620November71320December91726January13619February171027March151732April9817May91120June121325July91625August4610129132261I counted all the pupils and it came up to 261 pupils, 129 boys and 132 girls. To stratify the data and find out how many boys were born in September, I divided (/) the amount, which is 11 by the total amount, which is 261, and times (x) by 50, the amount of sample needed.If any data is missing or obviously wrong, I will use another person instead.11/26150=2.1I did that for all of the amounts. 14/26150, 7/26150 etcMonthsBoysGirlsTotal AmountStratified amount for boysStratified amount for girlsSeptember119202.111.72October146202.681.15November713201.342.49December917261.723.26January136192.491.15February1710273.261.92March1517322.873.26April98171.721.53May911201.722.11June1213252.32.49July916251.723.07August46100.771 .1512913226124.7125.29Next I rounded up the numbers to their nearest whole number.MonthsBoysGirlsTotal AmountSeptember224October314November134December235January213February325March336April224May224June224July235August112252550I then picked out the 50 samples by random with the three categories that I needed, which are the height, the hand span and the shoe size.I will use scatter diagrams and spearmans rank to see what correlation the two hypothesis has and how strong the correlation is.Scatter DiagramsA scatter diagram tells you how closely two things are related, the term correlation.A Strong Correlation means the two things are closely related to each other. A Weak Correlation means there is very little relationship. The line of best fit is a line that roughly goes through the middle of the points. The line can start from anywhere, not just from the y-axis and it doesnt have to go through any of the points exactly but it can. If the line slopes up its positive correlation, if it s lopes down its negative correlation. No correlation means theres no linear correlation.e.gHypothesis 1, taller people have bigger feet.This scatter diagram has a positive correlation because the line of best fit has a positive gradient. We know that this diagram is only moderately strong because the points are not close together. They are not reasonably close to the line of best fit but this shows that taller people have bigger feet.Hypothesis 2, taller people have bigger hand spans.This scatter diagram has a positive correlation. This diagram has a stronger correlation because the points are more bunched up. They are using the same scale so it would be easy to compare. They are quite close to the line of best fit etc. This shows that taller people have bigger hands.Spearmans RankTo compare the strength of the correlation accurately, we have to use Spearmans Rank.Spearmans Rank is written as r and it is a measure of the agreement between two sets of data. It is the more precise way of saying how strong the correlation is. The scale of Spearmans Rank is from -1 to 1.-1 indicates perfect negative correlation. This is sometimes called disagreement. This rarely happens.0 indicates no correlation. This is sometimes described as neither agreeing nor disagreeing.+1indicates perfect positive correlation. This is sometimes called agreement. This rarely happens.Each data value is given a rank depending on its size within the data set. r is based on the difference (d), between corresponding ranks. Spearmans rank correlation coefficient,d is the difference between corresponding ranks (it does not matter if the difference is negative as you have to square it)n is the number of data pairsIf two or more data values are the same, they have tied ranking. E.g if two values have tied ranks at 3rd and 4th, use the mean. 3+4=7, 7/2=3.5, so use 3.5 for both.Hypothesis 1, taller people have bigger feetI rank the height of the pupils in order from 1-50. I did the same again for the s hoe-size, ranking them from 1-50. I calculated their differences in ranks and squared the difference for all of them.Height nearest cmRanking OrderShoe sizeRanking Order(d)differenced*d1931122111912131-111883115.52.56.251854.5115.5111854.59149.590.251826914864181710.59241808115.5-2.56.251789821.512.5156.2517710115.5-4.520.251761191439175129142417413.5115.5-86417413.59140.50.2517316914-2417316821.55.530.2517316727.511.5132.2517218914-41617119821.52.56.2517021.5115.5-1625617021.5914-7.556.2517021.58.519-2.56.2517021.5541.52040016924.5727.53916924.56349.590.2516827821.5-5.530.2516827727.50.50.2516827541.514.5210.2516729.5541.51214416729.5448.51936116531.5727.5-41616531.5727.5-41616433.5727.5-63616433.5541.586416335914-21441163356.532-3916335634-1116039.5727.5-1214416039.5634-5.530.2516039.55.536-3.512.2516039.5541.52415842541.5-0.50.2515743541.5-1.52.2515544.5727.5-1728915544.5541.5-3915446.54.5470.50.2515446.5448.52415348541.5-6.542.2515049541.5-7.556.25145503.550003356.5The answer wa s 3356.5 when I added them all up. I substituted the answer into the formula.The answer 0.84 is very close to 1 so it has a very strong correlation. This suggests that taller people have bigger feet.Hypothesis 2, taller people have bigger hand spans.I rank the height of the pupils in order from 1-50. I did the same again for the hand-span, ranking them from 1-50. I calculated their differences in ranks and squared the difference for all of them.Height nearest cmRanking OrderHand Span, nearest mmRanking Order(d)differenced*d193124032419122501-1118832412-111854.519029.5256251854.5220105.530.25182621015.59.590.2518172354.5-2.56.2518082354.5-3.512.25178921015.56.542.251771021015.55.530.25176112307-4161751222010-2417413.52307-6.542.2517413.5200228.572.251731619029.513.5182.2517316200226361731621015.5-0.50.251721821213-5251711922010-98117021.521512-9.590.2517021.517044.52352917021.5200220.50.2517021.517044.52352916924.5195261.52.2516924.518037.513169168272307-204001682718037.510.5110.2516 827150502352916729.518037.586416729.520022-7.556.2516531.51614816.5272.2516531.520022-9.590.2516433.518037.541616433.519029.5-4161633519825-101001633520119-162561633520718-1728916039.518532.5-74916039.518134-5.530.2516039.518037.5-2416039.518037.5-241584219127-152251574317044.51.52.2515544.518532.5-1214415544.517641-3.512.2515446.517044.5-2415446.5160492.56.251534819029.5-18.5342.251504917044.5-4.520.251455017044.5-5.530.255694The answer was 5694 when I added them all up. I substituted the answer into the formula.The answer 0.73 is close to 1 so it has a strong correlation. This suggests that taller people have bigger hand spans but this correlation is not as strong as the other correlation. For hypothesis 1, the answer was 0.16 from perfect positive correlation. For hypothesis 2, the answer was 0.27 from perfect positive correlation.So Hypothesis 1 has a stronger correlation. A taller person is more likely to have bigger feet than a large hand span.Just for the HeightI will the tre at boys and girls separately because the results may differ. I wonder if there is any significant difference between the ways the heights of boys and girls are distributed because a small difference could make the whole result different. I will use standard deviation and spearmans rank later to prove this.My hypothesis is that the boys will have a higher dispersion.Mean, Mode and MedianI have also decided to calculate three averages, Mean, Mode and Median.The MeanThe mean is the average, Total of items / Number of items. You add up all the values and divide the amount of values. This is a useful average to use as it uses all the data. The disadvantage is that it could be affected by extreme values.I added up all the height of the boys and it came up to 4363cm. There were 25 values so it was 4363/25. The answer was 174.5cm. The average height of the boys was 174.5cm.I added up all the height of the girls and it came up to 4062cm. There were 25 values so it was 4062/25. The answer was 162.5cm. The average height of the girls was 162.5cm.The ModeThe mode is the most common value of data. This is easy to find but it does not utilise all the data.The mode for boys is 170cm.The mode for girls is 160cm.The MedianThe midpoint in a series of numbers; half the data values are above the median, and half are below. For example, in the odd series 1, 4, 9, 12 and 33, 9 is the median. In the even series 1, 4, 10, 12, 33 and 88, 11 is the median (halfway between 10 and 12). The median is not necessarily the same as the mean. For example, the median of 2, 6, 10, 22 and 40 is 10 but the average is 18. I will find the median by using a cumulative frequency curve. This is useful but it does not use all the data.I will also look at the spread and the range. This is calculated by taking away the smallest value from the biggest. I will calculate the inter quartile range using the cumulative frequency curve. It gives the spread of the middle 50% of the data and is less affected by ex treme values than the range.Standard DeviationStandard Deviation is the square root of the average of the squares of deviations about the mean of a set of data. Standard deviation is a statistical measure of spread or variability, a statistic that measures the dispersion of a sample. This is the formula:X is the value n is the number of valuesX is the meanI listed all the heights of the boys. Then I took away the mean, average (175 to nearest whole number) from each height. I squared the differences and added them up.XX-X(X-X)XX-X(X-X)19318324173-2419116256173-2418813169172-3918510100171-416182749170-525181636170-525180525170-52517839168-74917724164-1112117611163-1214417500160-15225174-11155-20400174-112022The answer was 2022 and I substituted it into the formula.In the end, the answer was 8.99, 9 to nearest whole number.I listed all the heights of the girls. Then I took away the mean, average (163 to nearest whole number) from each height. I squared the differences and added them u p.XX-X(X-X)XX-X(X-X)185224841630017310100160-39170749160-39169636160-39169636158-525168525157-636168525155-864167416154-981167416154-98116524153-1010016524150-1316916411145-18324163001703The answer was 1703 and I substituted it into the formula.In the end, the answer was 8.25, 8 to nearest whole number.Cumulative Frequency Diagrams (On graph paper)Box Plots (On graph paper)In the end, my results prove that I am right with my hypothesis. There is a 0.74 difference (1 to nearest whole number). This proves my hypothesis.I think there are significantly enough differences between the modes, medians and means in the distribution of boys and girls heights to treat them separately.Scatter Diagrams for males and females for Hypothesis 1The results will be different for the boys and for the girls. So the correlation for boys and girls will be different. I will have to investigate further on to prove this. I will show this by creating 2 scatter diagrams, 1 for boys and 1 for girls. I will do t hem separately by sorting them into males and females.The scatter diagrams proved that boys tend to be taller and have bigger feet. However, girls have a stronger correlation by looking at the diagrams. They seem closer to the line of best fit. To prove this, I had to use Spearmans Rank again.Spearmans Rank for Hypothesis 1I will have to do the same as before. I rank the height of the boys in order from 1-25. I did the same again for the shoe-size, ranking them from 1-25. I calculated their differences in ranks and squared the difference for all of them.GenderHeight nearest cmrank orderShoe sizerank orderdd*dM193112211M1912131-11M1883115.52.56.25M1854115.51.52.25M1825913.58.572.25M181610.5939M1807115.5-1.52.25M1788820.512.5156.25M1779115.5-3.512.25M17610913.53.512.25M17511913.52.56.25M17412.5115.5-749M17412.5913.511M17314.5820.5636M17314.5913.5-11M17216913.5-2.56.25M17117820.53.512.25M170198.518-11M17019913.5-5.530.25M17019115.5-13.5182.25M16821820.5-0.50.25M1642272424M16323913.5-9. 590.25M1602472400M15525724-11695.5The answer was 695.5 when I added them all up. I substituted the answer into the formula.The answer 0.73 is close to 1 so it has a strong correlation. This suggests that taller boys have bigger feet.I rank the height of the girls in order from 1-25. I did the same again for the shoe-size, ranking them from 1-25. I calculated their differences in ranks and squared the difference for all of them.GenderHeight nearest cmrank orderShoe sizerank orderdd*dF18519100F17327424F1703516.513.5182.25F1694.574-0.50.25F1694.5694.520.25F1686.5516.510100F1686.574-2.56.25F1678.5423.515225F1678.5516.5864F16510.574-6.542.25F16510.574-6.542.25F16412516.54.520.25F16313.569-4.520.25F16313.56.57-6.542.25F16016516.50.50.25F1601669-749F160165.511-525F15818516.5-1.52.25F15719516.5-2.56.25F15520516.5-3.512.25F15421.5423.524F15421.54.5220.50.25F15323516.5-6.542.25F15024516.5-7.556.25F145253.52500967The answer was 967 when I added them all up. I substituted the answer into the fo rmula.The answer 0.63 is not too close to 1 so it has a moderate correlation. This suggests that taller boys are more likely than girls to have bigger feet.ConclusionIn the end, I think that Spearmans Rank was the best because it gave a very accurate answer. It was difficult to work out all the answers but in the end, tall people have bigger feet and hand spans. But the data were only from year 10s in Salendine Nook High School so it only really proves that tall people in year 10 attending Salendine Nook High School have bigger feet and hand spans. However it could mean that all pupils in year 10 in different schools have bigger feet and hand spans. We dont and we wont know though as there are many other factors such as cultural background that we need to know to prove our results right. The data is also flawed as lots of information was missing and pupils imputed their data in differently.I think that I chose the right groups to prove my hypothesis. To improve this and make my resu lts better, I could get other schools data or maybe different years in my school. People like shoe or glove makers can use this data and design more shoes in region of the average size. In the end, I think I proved that my hypothesis is correct.