You have a dataset that has multiple blocks and only numeric variables.
You are computing in a local compute context.
You plan to lag a variable named x to create a new variable named x_lagged by using a transform function. You will create a new element in the output of the function.
You need to minimize the number of missing values.
Which three actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
You are running a large logistic regression for 1,000 feature variables by using the LoisticRegression() function in the MicrosoftML package. All of the predictor variables are numeric.
Currently, you specify the input variables separately by using the following formula.
Outcome ~ Feature000 + Feature001 + Feature002 + ... + Feature999
You discover that it takes 20 minutes to estimate each model.
You need to reduce the amount of time required to estimate each model without losing any information in the predictors.
What should you do?
You have a dataset that has a character variable.
You need to create a bag of counts of n-grams.
Which function should you use?
You plan to analyze data on a local computer. To improve performance, you plan to alternate the operation between a Microsoft SQL Server and the local computer.
You need to run complex code on the SQL Server, and then revert to the local compute context.
Which R code segment should you use?
You need to use the ScaleR distributed processing in an Apache Hadoop environment.
Which data source should you use?