Data Analyst Quiz
31. What does the SUM function in SQL do?
Counts the number of recordsReturns the total sum of a numeric column
Returns the highest value in a column
Sorts the records
32. In a box plot, what does the box represent?
The mean value of the datasetThe interquartile range (IQR)
The maximum value
The outliers
33. What is the main use of the GROUP BY clause in SQL?
To filter rowsTo join tables
To aggregate data based on columns
To sort records
34. Which of the following is a Python library used for machine learning?
NumPyPandas
Scikit-learn
Seaborn
35. What does the WHERE clause in SQL do?
Joins two tablesFilters records based on a condition
Groups data
Aggregates data
36. In Python, what does the DataFrame object represent in the Pandas library?
A 2D labeled data structureA 3D data array
A list of data
A tool for machine learning
37. What is the purpose of the HAVING clause in SQL?
To filter rows before aggregationTo filter rows after aggregation
To join multiple tables
To delete records
38. What is the primary purpose of the Python library Pandas?
VisualizationNumerical computations
Data manipulation and analysis
Machine learning
39. Which SQL command is used to retrieve data from a database?
INSERTSELECT
DELETE
UPDATE
40. What is the difference between a LEFT JOIN and an INNER JOIN in SQL?
LEFT JOIN returns only matching rowsLEFT JOIN returns all rows from the left table, and INNER JOIN returns only matches
INNER JOIN returns unmatched rows
LEFT JOIN removes duplicates
41. In Python, which function is used to calculate the mean of a list of numbers?
mean()sum()
avg()
max()
42. What does the term "data lake" refer to?
A tool for data visualizationA centralized repository for raw, unstructured data
A method for data cleansing
A type of relational database
43. What does the SQL DISTINCT keyword do?
Returns all rowsRemoves duplicates from the result set
Sorts the data
Joins multiple tables
44. What is the median in a dataset?
The sum of all values divided by the number of valuesThe most frequent value in a dataset
The middle value when the dataset is ordered
The highest value in the dataset
45. What is the function of a heatmap in data visualization?
To display data using color to represent valuesTo create 3D charts
46. What does a scatter plot represent in data visualization?
The distribution of categoriesThe relationship between two continuous variables
A comparison of averages
The distribution of time
47. What is a histogram used for in data analysis?
To represent the frequency distribution of a datasetTo compare categories
To show relationships between two variables
To display time series data
48. What is the Pearson correlation coefficient used for?
To measure the distribution of dataTo measure the variance
To measure the linear correlation between two variables
To group similar data
49. In machine learning, what is overfitting?
A model that performs well on new dataA model that is too complex and performs well on training data but poorly on unseen data
A model with too few features
A model that underestimates the variance
50. What is the purpose of feature scaling in machine learning?
To reduce the number of featuresTo normalize the range of independent variables
To increase model complexity
To split data into training and testing sets