Student Performance Analysis
This project aims to explore the various factors influencing student performance by analyzing a dataset that captures key aspects of students' academic and personal life. The dataset includes information such as study habits (hours studied, tutoring sessions), extracurricular participation, parental involvement, family income, and sleep patterns. Additionally, it incorporates data on school-related factors like teacher quality, peer influence, and school type, along with demographic details like gender and parental education level. By conducting exploratory data analysis, the goal is to uncover relationships between these variables and students' final exam scores, shedding light on the most impactful contributors to academic success.
See the full source code on GitHub: github.com/aelluminate/student-performance-analysis
Objectives
Data Exploration: Understand the structure and contents of the dataset, identifying key variables and their distributions.
Data Cleaning: Handle missing values, outliers, and inconsistencies in the data to ensure its quality and reliability.
Visual Analysis: Create visualizations to explore relationships between different variables and their impact on student performance.
The Data
This dataset provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success.
The dataset contains the following columns:
Hours_Studied
: Number of hours spent studying per week.Attendance
: Percentage of classes attended.Parental_Involvement
: Level of parental involvement in the student's education (Low, Medium, High).Access_to_Resources
: Availability of educational resources (Low, Medium, High).Extracurricular_Activities
: Participation in extracurricular activities (Yes, No).Sleep_Hours
: Average number of hours of sleep per night.Previous_Scores
: Scores from previous exams.Motivation_Level
: Student's level of motivation (Low, Medium, High).Internet_Access
: Availability of internet access (Yes, No).Tutoring_Sessions
: Number of tutoring sessions attended per month.Family_Income
: Family income level (Low, Medium, High).Teacher_Quality
: Quality of the teachers (Low, Medium, High).School_Type
: Type of school attended (Public, Private).Peer_Influence
: Influence of peers on academic performance (Positive, Neutral, Negative).Physical_Activity
: Average number of hours of physical activity per week.Learning_Disabilities
: Presence of learning disabilities (Yes, No).Parental_Education_Level
: Highest education level of parents (High School, College, Postgraduate).Distance_from_Home
: Distance from home to school (Near, Moderate, Far).Gender
: Gender of the student (Male, Female).Exam_Score
: Final exam score.
Methodology
The analysis will be conducted in Python using popular data science libraries such as Pandas, NumPy, Seaborn, and Matplotlib. The steps involved in the analysis include:
Data Loading: Load the dataset into a Pandas DataFrame for further processing.
Data Exploration: Understand the structure and contents of the dataset, identifying key variables and their distributions.
Exploratory Data Analysis (EDA): Analyze the relationships between different variables and their impact on student performance.
Data Visualization: Create visualizations to represent the data and explore patterns and trends.
Bivariate Analysis: Explore relationships between pairs of variables to identify potential correlations.
Multivariate Analysis: Analyze interactions between multiple variables to uncover complex relationships.
Visualizations
Correlation Matrix
This confirms the intuitive relationship that more study time generally leads to higher exam scores.
This suggests that regular attendance is a contributing factor to academic success, possibly due to increased exposure to course material and classroom interactions.
Sleep hours, previous scores, tutoring sessions, and physical activity appear to have minimal or no impact on exam scores in this dataset.
Hours Studied vs Exam Score by Attendance
The scatter plot demonstrates a clear positive relationship between the number of hours studied and the exam score. This indicates that students who study more tend to perform better on exams.
The color coding based on attendance reveals that students with higher attendance rates generally have higher exam scores. This suggests that regular attendance may be a contributing factor to academic success, in addition to study hours.
There are some outliers, which are data points that deviate significantly from the general trend. These could be due to various factors, such as individual differences in learning styles, external circumstances, or errors in data collection.
Student's Attendance Based on their Distance from Home
The box plot indicates that there is no significant difference in the overall distribution of student attendance based on distance from home. The median attendance rates are similar across all three distance categories (near, moderate, and far).
The line plot shows a slight downward trend in average attendance as distance from home increases. However, the confidence interval is relatively wide, suggesting that the relationship between distance and attendance is not strong.
Student's Exam Score Based on Parental Involvement
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on parental involvement. The median exam scores are similar across all three levels of parental involvement (low, medium, and high).
The IQR, represented by the height of the box, is also comparable across the categories, suggesting that the variability in exam scores is similar regardless of parental involvement.
The line plot shows a positive upward trend in average exam scores as parental involvement increases. However, the confidence interval is relatively wide, suggesting that the relationship between parental involvement and exam scores is not extremely strong.
Student's Exam Score Based on Family Income
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on family income. The median exam scores are similar across all three income levels (low, medium, and high).
The line plot shows a positive upward trend in average exam scores as family income increases. However, the confidence interval is relatively wide, suggesting that the relationship between family income and exam scores is not extremely strong.
Student's Exam Score Based on Student's Attendance
The scatter plot shows a clear positive correlation between student attendance and exam scores. This indicates that students with higher attendance rates tend to perform better on exams.
The line plot confirms the positive relationship between attendance and average exam scores. The line generally slopes upward, suggesting that as attendance increases, so does the average exam score.
The confidence interval around the line indicates the uncertainty in the relationship. While the overall trend is positive, there is some variability in the average exam scores at each attendance level.
The scatter plot shows a clear positive correlation between the number of hours studied and exam scores. This indicates that students who study more tend to perform better on exams.
The line plot confirms the positive relationship between study hours and average exam scores. The line generally slopes upward, suggesting that as students study more, their average exam scores tend to increase.
The confidence interval around the line indicates the uncertainty in the relationship. While the overall trend is positive, there is some variability in the average exam scores at each study hour level.
Student's Exam Score Based on Hours Studied
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on sleep hours. The median exam scores are similar across all sleep hour categories.
There are a few outliers, especially in the "4" and "10" categories, indicating that some students with very few or very many sleep hours may have significantly lower or higher exam scores than the majority.
The line plot shows a slight downward trend in average exam scores as sleep hours increase. However, the confidence interval is relatively wide, suggesting that the relationship between sleep hours and exam scores is not strong.
Student's Exam Score Based on Tutoring Sessions
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on tutoring sessions. The median exam scores are similar across all tutoring session categories.
The line plot shows a slight upward trend in average exam scores as the number of tutoring sessions increases. However, the confidence interval is relatively wide, suggesting that the relationship between tutoring sessions and exam scores is not extremely strong.
Student's Exam Score Based on Parental Education Level
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on parental education level. The median exam scores are similar across all three education levels (high school, college, and postgraduate).
The line plot shows a positive upward trend in average exam scores as parental education level increases. However, the confidence interval is relatively wide, suggesting that the relationship between parental education level and exam scores is not extremely strong.
Student's Exam Score Based on Peer Influence
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on peer influence. The median exam scores are similar across all three peer influence categories (positive, negative, and neutral).
The line plot shows a V-shaped pattern, with the lowest average exam score occurring in the "Negative" peer influence category and the highest average exam scores occurring in the "Positive" and "Neutral" categories. However, the confidence interval is relatively wide, suggesting that the relationship between peer influence and exam scores is not extremely strong.
Student's Exam Score Based on Internet Access
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on internet access. The median exam scores are similar for students with and without internet access.
There are a few outliers, especially in the "No" category, indicating that some students without internet access may have significantly lower or higher exam scores than the majority.
The line plot shows a slight downward trend in average exam scores for students without internet access. However, the confidence interval is relatively wide, suggesting that the relationship between internet access and exam scores is not extremely strong.
Percentage of Students with Internet Access
The bar plot shows a significant disparity in the number of students with and without internet access. A large majority of students (92.4%) have internet access, while only 7.6% do not.
The pie chart visually represents the same distribution, with a dominant blue slice representing students with internet access and a smaller orange slice representing those without.
Implications
The data indicates a significant digital divide among the students, with a large portion having access to the internet and its resources while a smaller portion is excluded.
This digital divide could impact educational equity, as students without internet access may have limited opportunities for online learning, research, and communication.
Schools and communities should prioritize efforts to bridge the digital divide by providing access to internet and digital devices, as well as training students in digital literacy skills.
In the context of remote or hybrid learning, the availability of internet access is crucial for students to participate fully and effectively. Schools may need to implement strategies to support students without internet access.
Student's Exam Score Based on Extracurricular Activities
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on participation in extracurricular activities. The median exam scores are similar for students who do and do not participate in extracurricular activities.
The line plot shows a slight upward trend in average exam scores for students who participate in extracurricular activities. However, the confidence interval is relatively wide, suggesting that the relationship between extracurricular activities and exam scores is not extremely strong.
Percentage of Students joined Extracurricular Activities
The bar plot shows that a majority of students (59.6%) participate in extracurricular activities, while 40.4% do not.
The pie chart visually represents the same distribution, with a larger blue slice representing students who participate in extracurricular activities and a smaller orange slice representing those who do not.
Implications
The data indicates a significant number of students are involved in extracurricular activities, suggesting a positive engagement with the school community.
Participation in extracurricular activities can contribute to student engagement, well-being, and overall development.
Schools should continue to offer a variety of extracurricular activities and ensure that all students have equal access to these opportunities.
Promoting extracurricular activities can contribute to a positive school culture and enhance student well-being.
Student's Exam Score Based on Physical Activity
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on physical activity levels. The median exam scores are similar across all physical activity categories.
The line plot shows a slight downward trend in average exam scores as physical activity levels increase. However, the confidence interval is relatively wide, suggesting that the relationship between physical activity and exam scores is not extremely strong.
Student's Exam Score Based on Learning Disabilities
The box plot indicates that there is no significant difference in the overall distribution of student exam scores based on learning disabilities. The median exam scores are similar for students with and without learning disabilities.
The line plot shows a slight downward trend in average exam scores for students with learning disabilities. However, the confidence interval is relatively wide, suggesting that the relationship between learning disabilities and exam scores is not extremely strong.
Percentage of Students with Learning Disabilities
The bar plot shows that a majority of students (89.5%) do not have learning disabilities, while 10.5% do.
The pie chart visually represents the same distribution, with a dominant blue slice representing students without learning disabilities and a smaller orange slice representing those with learning disabilities.
Results
The analysis will provide insights into the factors that most strongly influence student performance, highlighting the key drivers of academic success. By visualizing the relationships between different variables and their impact on exam scores, we aim to identify patterns and trends that can inform strategies for improving student outcomes.
Conclusion
The findings of this analysis will shed light on the various factors that contribute to student performance, helping educators, policymakers, and parents understand the key drivers of academic success. By identifying the most impactful variables and their relationships, we can develop targeted interventions and support mechanisms to enhance student outcomes and promote a culture of learning and achievement.
Last updated