Nicholas Uhorchak 2018-03-05
RClean
RCleaner, an interactive data cleaning tool provides users the dynamic ability to import and clean data. At its core, it provides R users functionality similar to that of Microsoft Excel with regards to preparation of a dataset for analysis.
Utilizing R to import and clean data is often a time consuming task. Without preparation of the dataset in excel or other software, R users must use scripts or command line R code for this task. The Interactive Data Cleaning tool will afford users the ability to do the following:
- Initialize RCleaner gadget with a dataset
- Visually inspect the dataset called into the gadget
- Select data columns to remove
- Select data rows to remove
- Provide the ability to rename columns in the dataframe
- Provide the option to scale the data.
- Provide the option to mean center the data
- Provide the ability to encode nominal data to numerical data
This analytic is being developed for those users in need of hasty data cleaning or those who would otherwise not wish to spend a large amount of time writing code to prepare data for analysis. Typical users will have working knowledge of R, however prefer the point and click abilities of Microsoft Excel or other similar software.
Users must be able to navigate R studio and understand how to use an R Gadget. In addition, they should be aware of the types of data contained in the dataset to be analyzed, whether numerical or categorical, such that they are aware of the application of some functions of this analytic tool.
- Mean center data
- Scale Data
- Generate indicator variables
This analytic will utilize the following existing R packages:
- shiny
- DT
- shinythemes
- markdown
- dummies
End users will call this gadget from the associated R package
None
Currently, the gadget only handles DF, matrix or tibble like objects with 2 or more columns. Single vectors are not handled.
| Feature | Description | Rank | Status | Value to user | Inputs | Outputs | Use? | Time? | Current or future version |
|---|---|---|---|---|---|---|---|---|---|
| Visual inspection of data | This feature will open the newly imported DF so the user can look at the data | 1 | COMPLETE | Quick and easy visual exploration of the dataset imported | Some dataset | Dataset output onto screen | Visual exporation of data | Yes | Current |
| Select releveant data columns to retain/remove | Allow the user to select what columns to either retain or remove from the current data | 2 | COMPLETE | Easily remove unwanted variables from the dataset | button click | Modified DF | Data cleaning | Yes | Current |
| Select releveant data rows to retain/remove | Allow the user to select what rows to either retain or remove from the current data | 3 | COMPLETE | Easily remove unwanted rows from the dataset | button click | Modified DF | Data cleaning | Yes | Current |
| Save clean data | User can save the "clean" data to a new dataframe in R | 4 | COMPLETE | Cleaned data saved for analysis | new name for clean DF | Clean DF | Save cleaned DF for future use | Yes | Current |
| Scale Data | Allow the user to scale the data | 5 | COMPLETE | Scale the data for future use | button click | Modified DF | Data prep | No | Current |
| Mean center data | Allow the user to center the data | 6 | COMPLETE | Mean center the data for future use | button click | Modified DF | Data prep | No | Current |
| Rename columns | Allow the user to rename columns in the DF | 7 | COMPLETE | Rename columns if necessary | Column names if necessary | Modified DF | Data cleaning | No | Future |
| Create indicator variables | Allow the user to create "dummy" variables to represent nominal data | 8 | COMPLETE | Create indicator variables | Variables to encode | Modified DF | Data prep | No | Future |
| Write "clean" data to excel | Allow user to write the clean data to new excel file | 9 | COMPLETE | Clean data is saved into external file for future use | file location | excel document | save file as excel doc for future use | No | Future |
| Modify DF cells | Allow users to click on a cell and change data values | 10 | not started | single cell value modification | N/A | modified DF | change cells | No | Future |
| Impute missing values | Allow the user to impute missing values | 11 | not started | NA | Method of imputation | Modified DF | Data prep | No | Future |