The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching
value in column itemId with a value in column transactionsId of DataFrame transactionsDf. Find the error.
Code block:
itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)
In which order should the code blocks shown below be run in order to create a DataFrame that shows the mean of column predError of DataFrame transactionsDf per column storeId and productId,
where productId should be either 2 or 3 and the returned DataFrame should be sorted in ascending order by column storeId, leaving out any nulls in that column?
DataFrame transactionsDf:
1. +-------------+---------+-----+-------+---------+----+
2. |transactionId|predError|value|storeId|productId| f|
3. +-------------+---------+-----+-------+---------+----+
4. | 1| 3| 4| 25| 1|null|
5. | 2| 6| 7| 2| 2|null|
6. | 3| 3| null| 25| 3|null|
7. | 4| null| null| 3| 2|null|
8. | 5| null| null| null| 2|null|
9. | 6| 3| 2| 25| 2|null|
10. +-------------+---------+-----+-------+---------+----+
1. .mean("predError")
2. .groupBy("storeId")
3. .orderBy("storeId")
4. transactionsDf.filter(transactionsDf.storeId.isNotNull())
5. .pivot("productId", [2, 3])
Which of the following code blocks returns a DataFrame that matches the multi-column DataFrame itemsDf, except that integer column itemId has been converted into a string column?
Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?
Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?