Package 'easyr'

Title:	Helpful Functions from Oliver Wyman Actuarial Consulting
Description:	Makes difficult operations easy. Includes these types of functions: shorthand, type conversion, data wrangling, and work flow. Also includes some helpful data objects: NA strings, U.S. state list, color blind charting colors. Built and shared by Oliver Wyman Actuarial Consulting. Accepting proposed contributions through GitHub.
Authors:	Oliver Wyman Actuarial Consulting [aut, cph], Bryce Chamberlain [aut, cre], Rajesh Sahasrabuddhe [ctb]
Maintainer:	Bryce Chamberlain <[email protected]>
License:	GPL (>= 2)
Version:	0.5-12
Built:	2025-03-31 05:26:37 UTC
Source:	https://github.com/oliver-wyman-actuarial/easyr

Help Index

Not-In
As Text
Auto-Type
Begin
Bin by Volume
Bind Rows with Factors
Capture Warning
Initialize cache.
Check Cache Status
Save Cache (Alternate) Saves the arguments to a cache file, using the cache.num last checked with cache.ok. This function provides an alternative syntax more aligned with other functions that start with "cache.".
cblind
Concatenate.
Characters to Factors
Check for Number Formatted as Character.
Check Value or Control Total
Clear Cache
Factor-friendly Coalesce
Concatenate and run.
Date difference (or difference in days).
Get Data Dictionary
Get Rows with Duplicates
Copy to Clipboard
NA-Friendly Equality Comparison
Factors to Characters
Full Join with Factors
Get Data Dictionary for Files in Folder
Number Formatter
Get better Int
Get Info
Golden Ratio
Hash Files
Identify headers row.
Inner Join with Factors
Shorthand for is.character
Shorthand for lubridate::is.Date
Shorthand for is.factor
Shorthand for is.numeric
Is Valid / Is a Value / NA NULL Check
Join and Replace Values.
left
Like Date
Left Join with Factors
Match Factors.
Date Difference in Months
mid
Shorthand for is.na
Names Like
Shorthand for is.nan
NA / NULL Check
NA Strings
Shorthand for is.null
Pad with Zeros
Date Difference in Quarters
Fix column names.
Read Any File
Read File as Text
right
Run Folder
Read Excel
Save Cache Saves the arguments to a cache file, using the cache.num last checked with cache.ok.
Search a Data Frame.
Text Similarity Search
Sample
states
Structure with Like
Summarize All Numeric Columns
tryCatch with Message
Transpose at Column.
tryCatch with warning
Convert to Logical/Boolean
Shorthand for as.character
Convert to Date
Convert to Number
Use Package
Validate Equal
Write
Convert Excel Number to Date
Date Difference in Years

Not-In

Description

Opposite of Author: Bryce Chamberlain.

Usage

needle %ni% haystack
needle %ni% haystack

Arguments

`needle`	Vector to search for.
`haystack`	Vector to search in.

Value

Boolean vector/value of comparisons.

Examples

c(1,3,11) %ni% 1:10
c(1,3,11) %ni% 1:10

As Text

Description

Prints a vector as text you can copy and paste back into the code. Helpful for copying vectors into code for testing and validation. Author: Bryce Chamberlain.

Usage

astext(x)
astext(x)

Arguments

`x`	Vector to represent as text.

Value

Vector represented as a character.

Examples

astext( c( 1, 2, 4 ) )
astext( c( 'a', 'b', 'c' ) )
astext( c( 1, 2, 4 ) )
astext( c( 'a', 'b', 'c' ) )

Auto-Type

Description

Use easyr date and number and conversion functions to automatically convert data to the most useful type available.

Usage

atype(
  x,
  auto_convert_dates = TRUE,
  allow_times = FALSE,
  check_numbers = TRUE,
  nazero = FALSE,
  check_logical = TRUE,
  isexcel = TRUE,
  stringsAsFactors = FALSE,
  nastrings = easyr::nastrings,
  exclude = NULL,
  use_n_sampled_rows = min(nrow(x), 10000)
)
atype(
  x,
  auto_convert_dates = TRUE,
  allow_times = FALSE,
  check_numbers = TRUE,
  nazero = FALSE,
  check_logical = TRUE,
  isexcel = TRUE,
  stringsAsFactors = FALSE,
  nastrings = easyr::nastrings,
  exclude = NULL,
  use_n_sampled_rows = min(nrow(x), 10000)
)

Arguments

`x`	Data to auto-type.
`auto_convert_dates`	Choose to convert dates.
`allow_times`	Choose if you want to get times. Only use this if your data has times, otherwise there is a small chance it will prevent proper date conversion.
`check_numbers`	Choose to convert numbers.
`nazero`	Convert NAs in numeric columns to 0.
`check_logical`	Choose to convert numbers.
`isexcel`	By default, we assume this data may have come from excel. This is to assist in date conversion from excel integers. If you know it didn't and are having issues with data conversion, set this to FALSE.
`stringsAsFactors`	Convert strings/characters to factors to save compute time, RAM/memory, and storage space.
`nastrings`	Strings to consider NA.
`exclude`	Column name(s) to exclude.
`use_n_sampled_rows`	Used on large data sets.

Details

Author: Bryce Chamberlain.

Value

Data frame with column types automatically converted.

Examples

# create some data in all-characters.
x = data.frame(
     char = c( 'abc', 'def' ),
     num = c( '1', '2' ),
     date = c( '1/1/2018', '2018-2-01' ),
     na = c( NA, NA ),
     bool = c( 'TRUE', 'FALSE' ),
     stringsAsFactors = FALSE
)

# different atype options. Note how the output types change.
str( atype( x ) )
str( atype( x, exclude = 'date' ) )
str( atype( x, auto_convert_dates = FALSE ) )
str( atype( x, check_logical = FALSE ) )
# create some data in all-characters.
x = data.frame(
     char = c( 'abc', 'def' ),
     num = c( '1', '2' ),
     date = c( '1/1/2018', '2018-2-01' ),
     na = c( NA, NA ),
     bool = c( 'TRUE', 'FALSE' ),
     stringsAsFactors = FALSE
)

# different atype options. Note how the output types change.
str( atype( x ) )
str( atype( x, exclude = 'date' ) )
str( atype( x, auto_convert_dates = FALSE ) )
str( atype( x, check_logical = FALSE ) )

Perform common operations before running a script. Includes clearing environment objects, disabling scientific notation, loading common packages, running fun/ or functions/ folders, and setting the working directory to the location of the current file.

Usage

begin(
  wd = NULL,
  load = c("magrittr", "dplyr"),
  keep = NULL,
  scipen = FALSE,
  verbose = TRUE,
  repos = "http://cran.us.r-project.org",
  runpath = NULL
)
begin(
  wd = NULL,
  load = c("magrittr", "dplyr"),
  keep = NULL,
  scipen = FALSE,
  verbose = TRUE,
  repos = "http://cran.us.r-project.org",
  runpath = NULL
)

Arguments

`wd`	Path to set as working directory. If blank, the location of the current file open in RStudio will be used if available. If FALSE, the working directory will not be changed.
`load`	Packages to load. If not available, they'll be installed.
`keep`	Environment objects to keep. If blank, all objects will be removed from the environment.
`scipen`	Do scientific notation in output?
`verbose`	Print information about what the function is doing?
`repos`	choose the URL to install from.
`runpath`	folder or file specified

Examples


begin()
begin()

Bin by Volume

Description

Bins a numerical column according to another numerical column's volume. For example if I want to bin a column "Age" (of people) into 10 deciles according to "CountofPeople" then I will get Age breakpoints returned by my function such that there is 10 This function handles NA's as their own separate bin, and handles any special values you want to separate out. Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

binbyvol(df, groupby, vol, numbins)
binbyvol(df, groupby, vol, numbins)

Arguments

`df`	(Data Frame) Your data.
`groupby`	(Character) Name of the column you'll create cuts in. Must be the character name of a numeric column.
`vol`	(Character) Name of the column for which which each cut will have an equal percentage of volume.
`numbins`	Number of bins to use.

Value

Age breakpoints returned by my function such that there is 10

Examples

# bin Sepal.Width according to Sepal.Length.
iris$bin <- binbyvol(iris, 'Sepal.Width', 'Sepal.Length', 5)

# check the binning success.
aggregate( Sepal.Length ~ bin, data = iris, sum )
# bin Sepal.Width according to Sepal.Length.
iris$bin <- binbyvol(iris, 'Sepal.Width', 'Sepal.Length', 5)

# check the binning success.
aggregate( Sepal.Length ~ bin, data = iris, sum )

Bind Rows with Factors

Description

Matches factor levels before binding rows. If factors have 0 levels it will change the column to character to avoid errors. Author: Bryce Chamberlain.

Usage

bindf(..., sort.levels = TRUE)
bindf(..., sort.levels = TRUE)

Arguments

`...`	data to be binded
`sort.levels`	Sort the factor levels after combining them.

Value

Binded data, with any factors modified to contain all levels in the binded data.

Examples


# create data where factors have different levels.
df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)#' 
df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

# bindf preserves factors but combines levels.
# factor-friendly functions default to ordered levels.
str( df1 )
str( bindf( df1, df2 ) )
# create data where factors have different levels.
df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)#' 
df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

# bindf preserves factors but combines levels.
# factor-friendly functions default to ordered levels.
str( df1 )
str( bindf( df1, df2 ) )

Capture Warning

Description

Utility function for capturing warnings.

Usage

cache.capture_warning(w)
cache.capture_warning(w)

Arguments

`w`	Captured warning passed by withCallingHandlers.

Examples

# this will only have an effect if a current cache exists.
## Not run: 
  if(!cache.ok(1)) withCallingHandlers({
     x = mtcars # base-R dataset.
     x = mtcars # base-R dataset.
       warning('warning 2-1') # this is the first warning we need tdo capture. 
       warning('warning 2-2') # this is the first warning we need tdo capture. 
       save.cache(x) # we'll capture it inside svae.caceh
  }, warning = cache.capture_warning)

## End(Not run)

# this will only have an effect if a current cache exists.
## Not run: 
  if(!cache.ok(1)) withCallingHandlers({
     x = mtcars # base-R dataset.
     x = mtcars # base-R dataset.
       warning('warning 2-1') # this is the first warning we need tdo capture. 
       warning('warning 2-2') # this is the first warning we need tdo capture. 
       save.cache(x) # we'll capture it inside svae.caceh
  }, warning = cache.capture_warning)

## End(Not run)

Initialize cache.

Description

Set cache info so easyr can manage the cache.

Usage

cache.init(
  caches,
  at.path,
  verbose = TRUE,
  save.only = FALSE,
  skip.missing = TRUE,
  n_processes = 2
)
cache.init(
  caches,
  at.path,
  verbose = TRUE,
  save.only = FALSE,
  skip.missing = TRUE,
  n_processes = 2
)

Arguments

`caches`	List of lists with properties name, depends.on. See example.
`at.path`	Where to save the cache. If NULL, a cache/ folder will be created in the current working directory.
`verbose`	Print via cat() information about cache operations.
`save.only`	Choose not to load the cache. Use this if you need to check cache validity in multiple spots but only want to load at the last check.
`skip.missing`	Passed to hashfiles, choose if an error occurs if a depends.on file isn't found.
`n_processes`	Passed to qs to determine how many cores/workers to use when reading/saving data.

Examples

# initialize a cache with 1 cache which depends on files in the current working directory.
# this will create a cache folder in your current working directory.
# then, you call functions to check and build the cache.
## Not run: 

  folder = system.file('extdata', package = 'easyr')
  cache.init(

   # Initial file read (raw except for renaming).
   caches = list(
     list( 
      name = 'prep-files',
      depends.on = paste0(folder, '/script.R')
     )
   ),

   at.path = paste0(tempdir(), '/cache')

  )


## End(Not run)
# initialize a cache with 1 cache which depends on files in the current working directory.
# this will create a cache folder in your current working directory.
# then, you call functions to check and build the cache.
## Not run: 

  folder = system.file('extdata', package = 'easyr')
  cache.init(

   # Initial file read (raw except for renaming).
   caches = list(
     list( 
      name = 'prep-files',
      depends.on = paste0(folder, '/script.R')
     )
   ),

   at.path = paste0(tempdir(), '/cache')

  )


## End(Not run)

Check Cache Status

Description

Check a cache and if necessary clear it to trigger a re-cache.

Usage

cache.ok(cache.num, do.load = TRUE)
cache.ok(cache.num, do.load = TRUE)

Arguments

`cache.num`	The index/number for the cache we are checking in the cache.info list.
`do.load`	Load the cache if it is found.

Value

Boolean indicating if the cache is acceptable. FALSE indicates the cache doesn't exist or is invalid so code should be run again.

Examples

# check the first cache to see if it exists and dependent files haven't changed.
# if this is TRUE, code in brackets will get skipped and the cache will be loaded instead.
# set do.load = FALSE if you have multiple files that build a cache, 
#    to prevent multiple cache loads.
# output will be printed to the console to tell you if the cache was loaded or re-built.
## Not run: 
  if( ! cache.ok(1) ){

    # do stuff
  
    # if this is the final file for this cache, 
    #   end with save.cache to save passed objects as a cache.
    save.cache(iris)
  }

## End(Not run)
# check the first cache to see if it exists and dependent files haven't changed.
# if this is TRUE, code in brackets will get skipped and the cache will be loaded instead.
# set do.load = FALSE if you have multiple files that build a cache, 
#    to prevent multiple cache loads.
# output will be printed to the console to tell you if the cache was loaded or re-built.
## Not run: 
  if( ! cache.ok(1) ){

    # do stuff
  
    # if this is the final file for this cache, 
    #   end with save.cache to save passed objects as a cache.
    save.cache(iris)
  }

## End(Not run)

Save Cache (Alternate) Saves the arguments to a cache file, using the cache.num last checked with cache.ok. This function provides an alternative syntax more aligned with other functions that start with "cache.".

Description

Save Cache (Alternate)

Saves the arguments to a cache file, using the cache.num last checked with cache.ok. This function provides an alternative syntax more aligned with other functions that start with "cache.".

Usage

cache.save(...)
cache.save(...)

Arguments

...

Objects to save.

Examples

# check the first cache to see if it exists and dependent files haven't changed.
# if this check is TRUE, code in brackets will get skipped and the cache will be loaded instead.
# set do.load = FALSE if you have multiple files that build a cache, 
#    to prevent multiple cache loads.
# output will be printed to the console to tell you if the cache was loaded or re-built.
## Not run: 
  if( ! cache.ok(1) ){

    # do stuff
  
    # if this is the final file for this cache, 
    #   end with cache.save to save passed objects as a cache.
    cache.save(iris)

  }


## End(Not run)
# check the first cache to see if it exists and dependent files haven't changed.
# if this check is TRUE, code in brackets will get skipped and the cache will be loaded instead.
# set do.load = FALSE if you have multiple files that build a cache, 
#    to prevent multiple cache loads.
# output will be printed to the console to tell you if the cache was loaded or re-built.
## Not run: 
  if( ! cache.ok(1) ){

    # do stuff
  
    # if this is the final file for this cache, 
    #   end with cache.save to save passed objects as a cache.
    cache.save(iris)

  }


## End(Not run)

cblind

Description

Color pallette that is effective for color-blind clients.

Usage

cblind
cblind

Format

Named vector of hex colors.

Concatenate.

Description

Shorthand function for paste. Author: Bryce Chamberlain.

Usage

cc(..., sep = "")
cc(..., sep = "")

Arguments

`...`	Arguments to be passed to paste0. Typcially a list of vectors or values to be concatenated.
`sep`	(Optional) Separator between concatenated items.

Value

Vector of pasted/concatenated values.

Examples

cc( 1, 2, 4 )
x = data.frame( c1 = c( 1, 2, 4 ), c2 = c( 3, 5, 7 ) )
cc( x$c1, x$c2 )
cc( x$c1, x$c2, sep = '-' )
cc( 1, 2, 4 )
x = data.frame( c1 = c( 1, 2, 4 ), c2 = c( 3, 5, 7 ) )
cc( x$c1, x$c2 )
cc( x$c1, x$c2, sep = '-' )

Characters to Factors

Description

Convert all character columns in a data frame to factors. Author: Bryce Chamberlain.

Usage

char2fac(x, sortlevels = FALSE, na_level = "(Missing)")
char2fac(x, sortlevels = FALSE, na_level = "(Missing)")

Arguments

`x`	Data frame to modify.
`sortlevels`	Choose whether to sort levels. This is the default R behavior and is therefore likely faster, but it may change the order of the data and this can be problematic so the default is FALSE.
`na_level`	some functions don't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip.

Value

Data frame with converted factors.

Examples

char2fac( iris )
char2fac( iris )

Check for Number Formatted as Character.

Description

Checks a vector or value to see if it is a number formatted as a character. Useful for checking columns formatted with $ or commas, etc. Author: Bryce Chamberlain. Tech review: Dominic Dillingham.

Usage

charnum(x, na_strings = easyr::nastrings, run_unique = TRUE, check_date = TRUE)
charnum(x, na_strings = easyr::nastrings, run_unique = TRUE, check_date = TRUE)

Arguments

`x`	Vector to check.
`na_strings`	Strings to consider NA.
`run_unique`	Convert to unique variables before checking. In some cases, this can make it take longer than necessary. In most, it will make it faster.
`check_date`	Check for a date, in which case it isn't a number. If you have already checked a date and know it isn't, set this to FALSE to run faster.

Value

True/false value indicating if the vector is a number formatted as a character. Helpful for checking before calling easyr:tonum().

Examples

charnum( c( 
   '123', '$50.02', '30%', '(300.01)', '-10', '1 230.4', NA, '-', '', "3.7999999999999999E-2" 
 ))
charnum( c( '123', 'abc', '30%', NA) )
# returns FALSE since this can be converted to a date:
charnum( c( '20180101' ))
charnum( c( 
   '123', '$50.02', '30%', '(300.01)', '-10', '1 230.4', NA, '-', '', "3.7999999999999999E-2" 
 ))
charnum( c( '123', 'abc', '30%', NA) )
# returns FALSE since this can be converted to a date:
charnum( c( '20180101' ))

Check Value or Control Total

Description

Check actual versus expected values and get helpful metrics back. Author: Bryce Chamberlain. Tech review: Lindsay Smeltzer.

Usage

checkeq(
  expected,
  actual,
  desc = "",
  acceptable_pct_diff = 0.00000001,
  digits = 2
)
checkeq(
  expected,
  actual,
  desc = "",
  acceptable_pct_diff = 0.00000001,
  digits = 2
)

Arguments

`expected`	The expected value of the metric.
`actual`	The actual value of the metric.
`desc`	(Optional) Description of the metric being checked.
`acceptable_pct_diff`	(Optional) Acceptable percentage difference when checking values. Checked as an absolute value.
`digits`	(Optional) Digits to round to. Without rounding you get errors from floating values. Set to NA to avoid rounding.

Value

Message (via cat) indicating success or errors out in case of failure.

Examples

checkeq(expected=100,actual=100,desc='A Match')
checkeq(expected=100,actual=100,desc='A Match')

Clear Cache

Description

Clears all caches or the cache related to the passed cache info list.

Usage

clear.cache(cache = NULL)
clear.cache(cache = NULL)

Arguments

cache

The cache list to clear.

Value

FALSE if a cache info list item is passed in order to assist other functions in returning this value, otherwise NULL.

Examples

# this will only have an effect if a current cache exists.
## Not run: 
  clear.cache()

## End(Not run)

# this will only have an effect if a current cache exists.
## Not run: 
  clear.cache()

## End(Not run)

Factor-friendly Coalesce

Description

Coalesce function that matches and updates factor levels appropriately. Checks each argument vector starting with the first until a non-NA value is found. Author: Bryce Chamberlain.

Usage

coalf(...)
coalf(...)

Arguments

...

Source vectors.

Value

Vector of values.

Examples

x <- sample(c(1:5, NA, NA, NA))
coalf(x, 0L)
x <- sample(c(1:5, NA, NA, NA))
coalf(x, 0L)

Concatenate and run.

Description

Concatenate arguments and run them as a command. Shorthand for eval( parse( text = paste0( ... ) ) ). Consider also using base::get() which can be used to get an object from a string, but only if it already exists. Author: Bryce Chamberlain.

Usage

crun(...)
crun(...)

Arguments

...

Character(s) to be concatenated and run as a command.

Examples

crun( 'print(', '"hello world!"', ')')
crun('T', 'RUE')
crun( 'print(', '"hello world!"', ')')
crun('T', 'RUE')

Date difference (or difference in days).

Description

Date difference (or difference in days).

Usage

ddiff(x, y, unit = "day", do.date.convert = TRUE, do.numeric = TRUE)
ddiff(x, y, unit = "day", do.date.convert = TRUE, do.numeric = TRUE)

Arguments

`x`	Vector of starting dates or items that can be converted to dates by todate.
`y`	Vector of ending dates or items that can be converted to dates by todate.
`unit`	Character indicating what to use as the unit of difference. Values like d, y, m or day, year, month will work. Takes just the first letter in lower-case to determine unit.
`do.date.convert`	Convert to dates before running the difference. If you know your columns are already dates, setting to FALSE will make your code run faster.
`do.numeric`	Convert the output to a number instead of a date difference object.

Value

Vector of differences.

Examples

ddiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )
ddiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )

Get Data Dictionary

Description

Get information about a Data Frame or Data Table. Use getinfo to explore a single column instead. If you like, use ecopy function or agument to copy to the clipboard so that it can be pasted into Excel. Otherwise it returns a data frame. Author: Scott Sobel. Tech Review & Modifications: Bryce Chamberlain.

Usage

dict(
  x,
  topn = 5,
  botn = 5,
  na.strings = easyr::nastrings,
  do.atype = TRUE,
  ecopy = FALSE
)
dict(
  x,
  topn = 5,
  botn = 5,
  na.strings = easyr::nastrings,
  do.atype = TRUE,
  ecopy = FALSE
)

Arguments

`x`	Data Frame or Data Table.
`topn`	Number of top values to print.
`botn`	Number of bottom values to print.
`na.strings`	Strings to consider NA.
`do.atype`	Auto-determine variable types. If your data already has types set, skip this to speed up the code.
`ecopy`	Use ecopy function or agument to copy to the clipboard so that it can be pasted into Excel.

Examples

dict(iris)
dict(iris)

Get Rows with Duplicates

Description

Pulls all rows with duplicates in a column, not just the duplicate row. Author: Bryce Chamberlain.

Usage

drows(x, c, na = FALSE)
drows(x, c, na = FALSE)

Arguments

`x`	Data frame.
`c`	Column as vector or string.
`na`	Consider multiple NAs as duplicates?

Value

Rows from the data frame in which the column is duplicated.

Examples

ddt = bindf( cars, utils::head( cars, 10 ) )
drows( ddt, 'speed' )
ddt = bindf( cars, utils::head( cars, 10 ) )
drows( ddt, 'speed' )

Copy to Clipboard

Description

Copies a data.frame or anything that can be converted into a data.frame. After running this, you can use ctrl+v or Edit > Paste to paste it to another program, typically Excel. A simple use case would be ecopy(names(df)) to copy the names of a data frame to the clipboard to paste to Excel or Outlook. Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

ecopy(
  x,
  showrowcolnames = c("cols", "rows", "both", "none"),
  show = FALSE,
  buffer = 1024
)
ecopy(
  x,
  showrowcolnames = c("cols", "rows", "both", "none"),
  show = FALSE,
  buffer = 1024
)

Arguments

`x`	Object you'd like to copy to the clipboard.
`showrowcolnames`	(Optional) Show row and column names. Choose 'none', 'cols', 'rows', or 'both'.
`show`	(Optional Boolean) Set to 'show' if you want to also print the object to the console.
`buffer`	(Optional) Set clipboard buffer size.

Examples

ecopy( iris, showrowcolnames = "cols", show = 'show' )
ecopy(iris)
ecopy( iris, showrowcolnames = "cols", show = 'show' )
ecopy(iris)

NA-Friendly Equality Comparison

Description

Vectorized flexible equality comparison which considers NAs as a value. Returns TRUE if both values are NA, and FALSE when only one is NA. The standard == comparison returns NA in both of these cases and sometimes this is interpreted unexpectedly. Author: Bryce Chamberlain. Tech Review: Maria Gonzalez.

Usage

eq(x, y, do.nanull.equal = TRUE)
eq(x, y, do.nanull.equal = TRUE)

Arguments

`x`	First vector/value for comparison.
`y`	Second vector/value for comparison.
`do.nanull.equal`	Return TRUE if both inputs are NA or NULL (tested via easyr::nanull).

Value

Boolean vector/value of comparisons.

Examples

c(NA,'NA',1,2,'c') == c(NA,NA,1,2,'a') # regular equality check.
eq(c(NA,'NA',1,2,'c'),c(NA,NA,1,2,'a')) # check with eq.
c(NA,'NA',1,2,'c') == c(NA,NA,1,2,'a') # regular equality check.
eq(c(NA,'NA',1,2,'c'),c(NA,NA,1,2,'a')) # check with eq.

Factors to Characters

Description

Convert all factor columns in a data frame to characters. Author: Bryce Chamberlain.

Usage

fac2char(x)
fac2char(x)

Arguments

`x`	Data frame to modify.

Value

Data frame with converted characters.

Examples

fac2char( iris )
fac2char( iris )

Full Join with Factors

Description

Matches factor levels before full join via merge. Author: Bryce Chamberlain.

Usage

fjoinf(
  data.left,
  data.right,
  by,
  sort.levels = TRUE,
  restrict.levels = FALSE,
  na_level = "(Missing)"
)
fjoinf(
  data.left,
  data.right,
  by,
  sort.levels = TRUE,
  restrict.levels = FALSE,
  na_level = "(Missing)"
)

Arguments

`data.left`	Left data. Only rows that matche the join will be included (may still result in duplication).
`data.right`	Right data. All of this data will be preservered in the join (may also result in duplication).
`by`	Columns to join on.
`sort.levels`	Sort the factor levels after combining them.
`restrict.levels`	Often the joined data won't use all the levels in both datasets. Set to TRUE to remove factor levels that aren't in the joined data.
`na_level`	some functions don't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip.

Value

Joined data, with any factors modified to contain all levels in the joined data.

Examples


df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

fjoinf( df1, df2, by = 'factor.join' )

df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

fjoinf( df1, df2, by = 'factor.join' )

Get Data Dictionary for Files in Folder

Description

Get information about data files in a folder path. Use dict() on a single data frame or getinfo(0) to explore a single column. Author: Bryce Chamberlain.

Usage

fldict(
  folder = NULL,
  file.list = NULL,
  pattern = "^[^~]+[.](xls[xmb]?|csv|rds|xml)",
  ignore.case = TRUE,
  recursive = TRUE,
  verbose = FALSE,
  ...
)
fldict(
  folder = NULL,
  file.list = NULL,
  pattern = "^[^~]+[.](xls[xmb]?|csv|rds|xml)",
  ignore.case = TRUE,
  recursive = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`folder`	File path of the folder to create a dictionary for. Pass either this or file.list. file.list will override this argument.
`file.list`	List of files to create a combined dictionary for. Pass either this or folder. This will ovveride folder.
`pattern`	Pattern to match files in the folder. By default we use a pattern that matches read.any-compatible data files and skips temporary Office files. Passed to list.files.
`ignore.case`	Ignore case when checking pattern. Passed to list.files.
`recursive`	Check files recursively. Passed to list.files.
`verbose`	Print helpful information.
`...`	Other arguments to read.any for reading in files. Consider using a first_column_name vector, etc.

Value

List with the properties:

`s`	Summary data of each dataset.
`l`	Line data with a row for each column in each dataset.

Examples


folder = system.file('extdata', package = 'easyr')
fl = fldict(folder)
names(fl)

fl$sheets
fl$columns

folder = system.file('extdata', package = 'easyr')
fl = fldict(folder)
names(fl)

fl$sheets
fl$columns

Number Formatter

Description

Flexible number formatter for easier formatting from numbers and dates into characters for display.

Usage

fmat(
  x = NULL,
  type = c("auto", ",", "$", "%", ".", "mdy", "ymd", "date", "dollar", "dollars",
    "count", "percentage", "decimal"),
  do.return = c("formatted", "highcharter"),
  digits = NULL,
  with.unit = FALSE,
  do.date.sep = "/",
  do.remove.spaces = FALSE,
  digits.cutoff = NULL
)
fmat(
  x = NULL,
  type = c("auto", ",", "$", "%", ".", "mdy", "ymd", "date", "dollar", "dollars",
    "count", "percentage", "decimal"),
  do.return = c("formatted", "highcharter"),
  digits = NULL,
  with.unit = FALSE,
  do.date.sep = "/",
  do.remove.spaces = FALSE,
  digits.cutoff = NULL
)

Arguments

`x`	Vector of values to convert. If retu
`type`	Type of format to return. If do.return == 'highcharter' this is not required.
`do.return`	Information to return. "formatted" returns a vector of formatted values.
`digits`	Number of digits for rounding. If left blank, the funtion will guess at the best digits.
`with.unit`	For large numbers, choose to add a suffix for fewer characters, like M for million, etc.
`do.date.sep`	Separator for date formatting.
`do.remove.spaces`	Remove extra spaces in return.
`digits.cutoff`	Amount at which to show 0 digits. Allows for flexibility of rounding.

Value

Information requested via do.return.

Examples


fmat( 1000, 'dollar', digits = 2 )

fmat( 1000, 'dollar', digits = 2 )

Get better Int

Description

Takes bucket names of binned values such as [1e3,2e3) or [0.1234567, 0.2) and formats the values nicely into values such as 1,000-2,000 or 0.12-0.20 Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

getbetterint(int)
getbetterint(int)

Arguments

int

Vector of character bucket names to transform.

Value

Vector of transformed values.

Examples

iris$bin <- binbyvol( iris, 'Sepal.Width', 'Sepal.Length', 5 )
getbetterint( iris$bin )
iris$bin <- binbyvol( iris, 'Sepal.Width', 'Sepal.Length', 5 )
getbetterint( iris$bin )

Get Info

Description

Get information about a Column in a Data Frame or Data Table. Use getdatadict to explore all columns in a dataset instead. Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

getinfo(
  df,
  colname,
  topn = 5,
  botn = 5,
  graph = TRUE,
  ordered = TRUE,
  display = TRUE,
  cutoff = 20,
  main = NULL,
  cex = 0.9,
  xcex = 0.9,
  bins = 50,
  col = "light blue"
)
getinfo(
  df,
  colname,
  topn = 5,
  botn = 5,
  graph = TRUE,
  ordered = TRUE,
  display = TRUE,
  cutoff = 20,
  main = NULL,
  cex = 0.9,
  xcex = 0.9,
  bins = 50,
  col = "light blue"
)

Arguments

`df`	Data Frame or Data Table.
`colname`	(Character) Name of the column to get information about.
`topn`	(Optional) Number of top values to print.
`botn`	(Optional) Number of bottom values to print.
`graph`	(Boolean Optional) Output a chart of the column.
`ordered`	(Optional)
`display`	(Optional)
`cutoff`	(Optional)
`main`	(Optional)
`cex`	(Optional)
`xcex`	(Optional)
`bins`	(Optional)
`col`	(Optional)

Value

Only if display = FALSE, returns information about the column. Otherwise information comes through the graphing pane and the console (via cat/print).

Examples

getinfo(iris,'Sepal.Width')
getinfo(iris,'Species')
getinfo(iris,'Sepal.Width')
getinfo(iris,'Species')

Golden Ratio

Description

Get the golden ratio. Author: Bryce Chamberlain. Tech Review: Maria Gonzalez.

Usage

gr()
gr()

Value

The golden ratio: (1+sqrt(5)) / 2

Examples

gr()
gr()

Hash Files

Description

Get a hash value representing a list of files. Useful for determining if files have changed in order to reset dependent caches.

Usage

hashfiles(
  x,
  skip.missing = FALSE,
  full.hash = FALSE,
  verbose = FALSE,
  skiptemp = TRUE
)
hashfiles(
  x,
  skip.missing = FALSE,
  full.hash = FALSE,
  verbose = FALSE,
  skiptemp = TRUE
)

Arguments

`x`	Input which specifies which files to hash. This can be a vector mix of paths and files.
`skip.missing`	Skip missing files. Default is to throw an error if a file isn't found.
`full.hash`	By default we just hash the file info (name, size, created/modified time). Set this to TRUE to read the file and hash the contents.
`verbose`	Print helpful messages from code.
`skiptemp`	Skip temporary MS Office files like "~$Simd Loss Eval 2018-06-30.xlsx"

Value

String representing hash of files.

Examples

folder = system.file('extdata', package = 'easyr')
hashfiles(folder)
folder = system.file('extdata', package = 'easyr')
hashfiles(folder)

Identify headers row.

Description

Identify the row with headers in a data frame. It should NOT be used directly (that's why it isn't exported), but will be called by function [read.any] as necessary, with the applicable defaults set by that function.

Usage

headers_row(
  x,
  headers_on_row = NA,
  first_column_name = NA,
  field_name_map = NA
)
headers_row(
  x,
  headers_on_row = NA,
  first_column_name = NA,
  field_name_map = NA
)

Arguments

`x`	Data frame to work with.
`headers_on_row`	The specific row with headers on it.
`first_column_name`	A known column(s) that can be used to find the header row. This is more flexible, but only used if headers_on_row is not available. If multiple are possible, use a vector argument here.
`field_name_map`	field_name_map from read.any.

Value

List with headers_already_column_names (TRUE/FALSE); headers_on_row (1-indexed number of the to match standard R indexing).

Inner Join with Factors

Description

Matches factor levels before inner join via merge. Author: Bryce Chamberlain.

Usage

ijoinf(
  data.left,
  data.right,
  by,
  sort.levels = TRUE,
  restrict.levels = FALSE,
  na_level = "(Missing)"
)
ijoinf(
  data.left,
  data.right,
  by,
  sort.levels = TRUE,
  restrict.levels = FALSE,
  na_level = "(Missing)"
)

Arguments

`data.left`	Left data. Only rows that matche the join will be included (may still result in duplication).
`data.right`	Right data. Only rows that matche the join will be included (may also result in duplication).
`by`	Columns to join on.
`sort.levels`	Sort the factor levels after combining them.
`restrict.levels`	Often the joined data won't use all the levels in both datasets. Set to TRUE to remove factor levels that aren't in the joined data.
`na_level`	some functions don't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip.

Value

Joined data, with any factors modified to contain all levels in the joined data.

Examples


df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

ljoinf( df1, df2, by = 'factor.join' )

df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

ljoinf( df1, df2, by = 'factor.join' )

Shorthand for is.character

Description

Shorthand for is.character

Usage

ischar(x)
ischar(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

ischar( 'a character' )
ischar(1)
ischar( 'a character' )
ischar(1)

Shorthand for lubridate::is.Date

Description

Shorthand for lubridate::is.Date

Usage

isdate(x)
isdate(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

isdate( lubridate::mdy( '10/1/2014' ) )
isdate(1)
isdate( lubridate::mdy( '10/1/2014' ) )
isdate(1)

Shorthand for is.factor

Description

Shorthand for is.factor

Usage

isfac(x)
isfac(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

isfac( factor( c( 'a', 'b', 'c' ) ) )
isfac(1)
isfac( factor( c( 'a', 'b', 'c' ) ) )
isfac(1)

Shorthand for is.numeric

Description

Shorthand for is.numeric

Usage

isnum(x)
isnum(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

isnum(1)
isnum( factor( c( 'a', 'b', 'c' ) ) )
isnum(1)
isnum( factor( c( 'a', 'b', 'c' ) ) )

Is Valid / Is a Value / NA NULL Check

Description

Facilitates checking for missing values which may cause errors later in code. NULL values can cause errors on is.na checks, and is.na can cause warnings if it is inside if() and is passed multiple values. This function makes it easier to check for missing values before trying to operate on a variable. It will NOT check for strings like "" or "NA". Only NULL and NA values will return TRUE. Author: Bryce Chamberlain. Tech Review: Maria Gonzalez.

Usage

isval(x, na_strings = easyr::nastrings, do.test.each = FALSE)
isval(x, na_strings = easyr::nastrings, do.test.each = FALSE)

Arguments

`x`	Object to check. In the case of a data frame or vector, it will check the first (non-NULL) value.
`na_strings`	(Optional) Set the strings you want to consider NA. These will be applied after stringr::str_trim on x.
`do.test.each`	Return a vector of results to check each element instead of checking the entire object.

Value

True/false indicating if the argument is NA, NULL, or an empty/NA string/vector. For speect, only the first value is checked.

Examples

isval( NULL )
isval( NA )
isval( c( NA , NULL ) )
isval( c( 1, 2, 3 ) )
isval( c( NA, 2, 3 ) )
isval( c( 1, 2, NA ) ) # only the first values is checked, so this will come back FALSE.
isval( c( NULL, 2, 3 ) ) # NULL values get skipped in a vector.
isval( data.frame() )
isval( dplyr::group_by( dplyr::select( cars, speed, dist ), speed ) ) # test a tibble.
isval( "#VALUE!" ) # test an excel error code.
isval( NULL )
isval( NA )
isval( c( NA , NULL ) )
isval( c( 1, 2, 3 ) )
isval( c( NA, 2, 3 ) )
isval( c( 1, 2, NA ) ) # only the first values is checked, so this will come back FALSE.
isval( c( NULL, 2, 3 ) ) # NULL values get skipped in a vector.
isval( data.frame() )
isval( dplyr::group_by( dplyr::select( cars, speed, dist ), speed ) ) # test a tibble.
isval( "#VALUE!" ) # test an excel error code.

Join and Replace Values.

Description

Replace a columns values with matches in a different dataset. Author: Bryce Chamberlain.

Usage

jrepl(
  x,
  y,
  by,
  replace.cols,
  na.only = FALSE,
  only.rows = NULL,
  verbose = FALSE,
  viewalldups = FALSE,
  warn = FALSE
)
jrepl(
  x,
  y,
  by,
  replace.cols,
  na.only = FALSE,
  only.rows = NULL,
  verbose = FALSE,
  viewalldups = FALSE,
  warn = FALSE
)

Arguments

`x`	Main dataset which will have new values. This data set will be returned with new values.
`y`	Supporting dataset which has the id and new values.
`by`	Vector of join column names. A character vector if the names match. A named character vector if they don't.
`replace.cols`	Vector of replacement column names, similar format as by.
`na.only`	Only replace values that are NA.
`only.rows`	Select rows to be affected. Default checks all rows.
`verbose`	Print via cat information about the replacement.
`viewalldups`	Set to TRUE to see all duplicates
`warn`	Set to TRUE to see warnings.

Value

x with new values.

Examples


df1 = utils::head( sleep )
group.reassign = data.frame( 
  id.num = factor( c( 1, 3, 4 ) ), 
group.replace = factor( c( 99, 99, 99 ) ) 
)

jrepl( 
  x = df1, 
  y = group.reassign, 
  by = c( 'ID' = 'id.num' ), 
  replace.cols = c( 'group' = 'group.replace' ) 
)

# doesn't affect since there are no NAs in group.
jrepl( 
  x = df1,
  y = group.reassign, 
  by = c( 'ID' = 'id.num' ), 
  replace.cols = c( 'group' = 'group.replace' ), 
  na.only = TRUE  
) 
df1 = utils::head( sleep )
group.reassign = data.frame( 
  id.num = factor( c( 1, 3, 4 ) ), 
group.replace = factor( c( 99, 99, 99 ) ) 
)

jrepl( 
  x = df1, 
  y = group.reassign, 
  by = c( 'ID' = 'id.num' ), 
  replace.cols = c( 'group' = 'group.replace' ) 
)

# doesn't affect since there are no NAs in group.
jrepl( 
  x = df1,
  y = group.reassign, 
  by = c( 'ID' = 'id.num' ), 
  replace.cols = c( 'group' = 'group.replace' ), 
  na.only = TRUE  
)

left

Description

Behaves like Excel's LEFT, RIGHT, and MID functions Author: Dave. Tech review: Bryce Chamberlain.

Usage

left(string, char)
left(string, char)

Arguments

`string`	String to process.
`char`	Number of characters.

Examples

left( "leftmidright", 4 )
left( "leftmidright", 4 )

Like Date

Description

Check if a column can be converted to a date. Helpful for checking a column before actually converting it. Author: Bryce Chamberlain. Tech review: Dominic Dillingham.

Usage

likedate(
  x,
  na_strings = easyr::nastrings,
  run_unique = TRUE,
  aggressive.extraction = TRUE
)
likedate(
  x,
  na_strings = easyr::nastrings,
  run_unique = TRUE,
  aggressive.extraction = TRUE
)

Arguments

`x`	Value or vector to check.
`na_strings`	Vector of characters to consider NA. Like Date will treat these values like NA.
`run_unique`	Convert to unique variables before checking. In some cases, this can make it take longer than necessary. In most, it will make it faster.
`aggressive.extraction`	todate will take dates inside long strings (like filenames) and convert them to dates. This seems to be the preferred outcome, so we leave it as default (TRUE). However, if you want to avoid this you can do so via this option (FALSE).

Value

Boolean indicating if the entire vector can be converted to a date.

Examples

x <- c('20171124','2017/12/24',NA,'12/24/2017','March 3rd, 2015','Mar 3, 2016')
likedate(x)
likedate(c(123,456,NA))
if(likedate(x)) t <- todate(x)
likedate(lubridate::mdy('1-1-2014'))
likedate( '3312019' )
likedate( '2019.1.3' )
x <- c('20171124','2017/12/24',NA,'12/24/2017','March 3rd, 2015','Mar 3, 2016')
likedate(x)
likedate(c(123,456,NA))
if(likedate(x)) t <- todate(x)
likedate(lubridate::mdy('1-1-2014'))
likedate( '3312019' )
likedate( '2019.1.3' )

Left Join with Factors

Description

Matches factor levels before left join via merge. Author: Bryce Chamberlain.

Usage

ljoinf(
  data.left,
  data.right,
  by,
  sort.levels = TRUE,
  restrict.levels = FALSE,
  na_level = "(Missing)"
)
ljoinf(
  data.left,
  data.right,
  by,
  sort.levels = TRUE,
  restrict.levels = FALSE,
  na_level = "(Missing)"
)

Arguments

`data.left`	Left data. All of this data will be preservered in the join (may still result in duplication).
`data.right`	Right data. Only rows that matche the join will be included (may also result in duplication).
`by`	Columns to join on.
`sort.levels`	Sort the factor levels after combining them.
`restrict.levels`	Often the joined data won't use all the levels in both datasets. Set to TRUE to remove factor levels that aren't in the joined data.
`na_level`	some functions don't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip.

Value

Joined data, with any factors modified to contain all levels in the joined data.

Examples


df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

ljoinf( df1, df2, by = 'factor.join' )

df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

ljoinf( df1, df2, by = 'factor.join' )

Match Factors.

Description

Modifies two datasets so matching factor columns have the same levels. Typically this is used prior to joining or bind_rows in the easyr functions bindf, ijoinf, lfjoinf.

Usage

match.factors(df1, df2, by = NA, sort.levels = TRUE)
match.factors(df1, df2, by = NA, sort.levels = TRUE)

Arguments

`df1`	First data set.
`df2`	Second data set.
`by`	Columns to join on, comes from the function using match.factors (ljoinf, fjoinf, ijoinf).
`sort.levels`	Sort the factor levels after combining them.

Value

List of the same data but with factors modified as applicable. All factors are checked if no 'by' argument is passed. Otherwise only the 'by' argument is checked.

Examples


df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

t = match.factors( df1, df2 )
levels( df1$factor1 )
levels( t[[1]]$factor1 )
levels( t[[2]]$factor1 )

df1 = data.frame(
  factor1 = c( 'a', 'b', 'c' ),
  factor2 = c( 'high', 'medium', 'low' ),
  factor.join = c( '0349038u093843', '304359867893753', '3409783509735' ),
  numeric = c( 1, 2, 3 ),
  logical = c( TRUE, TRUE, TRUE )
)

df2 = data.frame(
  factor1 = c( 'd', 'e', 'f' ),
  factor2 = c( 'low', 'medium', 'high' ),
  factor.join = c( '32532532536', '304359867893753', '32534745876' ),
  numeric = c( 4, 5, 6 ),
  logical = c( FALSE, FALSE, FALSE )
)

t = match.factors( df1, df2 )
levels( df1$factor1 )
levels( t[[1]]$factor1 )
levels( t[[2]]$factor1 )

Date Difference in Months

Description

Date Difference in Months

Usage

mdiff(x, y, do.date.convert = TRUE, do.numeric = TRUE)
mdiff(x, y, do.date.convert = TRUE, do.numeric = TRUE)

Arguments

`x`	Vector of starting dates or items that can be converted to dates by todate.
`y`	Vector of ending dates or items that can be converted to dates by todate.
`do.date.convert`	Convert to dates before running the difference. If you know your columns are already dates, setting to FALSE will make your code run faster.
`do.numeric`	Convert the output to a number instead of a date difference object.

Value

Vector of differences.

Examples

mdiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )
mdiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )

mid

Description

Behaves like Excel's LEFT, RIGHT, and MID functions Author: Bryce Chamberlain.

Usage

mid(string, start, nchars)
mid(string, start, nchars)

Arguments

`string`	String to process.
`start`	Index (1-index) to start at.
`nchars`	Number of characters to read in from start.

Examples

mid( "leftmidright", 5, 3 )
mid( "leftmidright", 5, 3 )

Shorthand for is.na

Description

Shorthand for is.na

Usage

na(x)
na(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

na(NA)
na(1)
na(NA)
na(1)

Names Like

Description

Get column names that match a pattern. Author: Scott Sobel. Tech review: Bryce Chamberlain.

Usage

namesx(df, char, fixed = TRUE, ignore.case = TRUE)
namesx(df, char, fixed = TRUE, ignore.case = TRUE)

Arguments

`df`	Object with names you'd like to search.
`char`	Regex chracter to match to columns.
`fixed`	Match as a string, not a regular expression.
`ignore.case`	Ignore case in matches.

Value

Vector of matched names.

Examples

namesx( iris,'len' )
namesx( iris,'Len' )
namesx( iris,'len' )
namesx( iris,'Len' )

Shorthand for is.nan

Description

Shorthand for is.nan

Usage

nan(x)
nan(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

nan( NaN )
nan(1)
nan( NaN )
nan(1)

NA / NULL Check

Description

Usage

nanull(x, na_strings = easyr::nastrings, do.test.each = FALSE)
nanull(x, na_strings = easyr::nastrings, do.test.each = FALSE)

Arguments

`x`	Vector to check. In the case of a data frame or vector, it will check the first (non-NULL) value.
`na_strings`	(Optional) Set the strings you want to consider NA. These will be applied after stringr::str_trim on x.
`do.test.each`	Return a vector of results to check each element instead of checking the entire object.

Value

True/false indicating if the argument is NA, NULL, or an empty/NA string/vector. For speect, only the first value is checked.

Examples

nanull( NULL )
nanull( NA )
nanull( c( NA , NULL ) )
nanull( c( 1, 2, 3 ) )
nanull( c( NA, 2, 3 ) )
nanull( c( 1, 2, NA ) ) # only the first values is checked, so this will come back FALSE.
nanull( c( NULL, 2, 3 ) ) # NULL values get skipped in a vector.
nanull( data.frame() )
nanull( dplyr::group_by( dplyr::select( cars, speed, dist ), speed ) ) # test a tibble.
nanull( NULL )
nanull( NA )
nanull( c( NA , NULL ) )
nanull( c( 1, 2, 3 ) )
nanull( c( NA, 2, 3 ) )
nanull( c( 1, 2, NA ) ) # only the first values is checked, so this will come back FALSE.
nanull( c( NULL, 2, 3 ) ) # NULL values get skipped in a vector.
nanull( data.frame() )
nanull( dplyr::group_by( dplyr::select( cars, speed, dist ), speed ) ) # test a tibble.

NA Strings

Description

A list of strings to consider NA. Includes blank string, "NA", excel errors, etc. Used throughout easyr for checking NA.

Usage

nastrings
nastrings

Format

A vector of values.

Shorthand for is.null

Description

Shorthand for is.null

Usage

null(x)
null(x)

Arguments

`x`	Value to check.

Value

logical indicator

Examples

null( NULL )
null(1)
null( NULL )
null(1)

Pad with Zeros

Description

Adds leading zeros to a numeric vector to make each value a specific length. For values shorter than length passed, leading zeros are removed. Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

pad0(x, len)
pad0(x, len)

Arguments

`x`	Vector.
`len`	Number of characters you want in each value.

Value

Character vector with padded values.

Examples

pad0( c(123,00123,5), len = 5 )
pad0( c(123,00123,5), len = 2 )
pad0( '1234', 5 )
pad0( c(123,00123,5), len = 5 )
pad0( c(123,00123,5), len = 2 )
pad0( '1234', 5 )

Date Difference in Quarters

Description

Date Difference in Quarters

Usage

qdiff(x, y, do.date.convert = TRUE, do.numeric = TRUE)
qdiff(x, y, do.date.convert = TRUE, do.numeric = TRUE)

Arguments

`x`	Vector of starting dates or items that can be converted to dates by todate.
`y`	Vector of ending dates or items that can be converted to dates by todate.
`do.date.convert`	Convert to dates before running the difference. If you know your columns are already dates, setting to FALSE will make your code run faster.
`do.numeric`	Convert the output to a number instead of a date difference object.

Value

Vector of differences.

Examples

qdiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )
qdiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )

Fix column names.

Description

Code to fix column names, since this has to be done up to twice will reading in files. It should NOT be used directly (that's why it isn't exported), but will be called by function [read.any] as necessary, with the applicable defaults set by that function.

Usage

rany_fixColNames(col_names, fix.dup.column.names, nastrings)
rany_fixColNames(col_names, fix.dup.column.names, nastrings)

Arguments

`col_names`	Vector/value of colum names/name.
`fix.dup.column.names`	Adds 'DUPLICATE #' to duplicated column names to avoid errors with duplicate names.
`nastrings`	Characters/strings to read as NA.

Value

Fixed names.

Read Any File

Description

Flexible read function to handle many types of files. Currently handles CSV, TSV, DBF, RDS, XLS (incl. when formatted as HTML), and XLSX. Also handles common issues like strings being read in as factors (strings are NOT read in as factors by this function, you'd need to convert them later). Author: Bryce Chamberlain. Tech Review: Dominic Dillingham.

Usage

read.any(
  filename = NA,
  folder = NA,
  sheet = 1,
  file_type = "",
  first_column_name = NA,
  header = is.null(widths),
  headers_on_row = NA,
  nrows = -1L,
  row.names.column = NA,
  row.names.remove = TRUE,
  make.names = FALSE,
  field_name_map = NA,
  require_columns = NA,
  all_chars = FALSE,
  auto_convert_dates = TRUE,
  allow_times = FALSE,
  check_numbers = TRUE,
  nazero = FALSE,
  check_logical = TRUE,
  stringsAsFactors = FALSE,
  na_strings = easyr::nastrings,
  na_level = "(Missing)",
  ignore_rows_with_na_at = NA,
  drop.na.cols = TRUE,
  drop.na.rows = TRUE,
  fix.dup.column.names = TRUE,
  do.trim.sheetname = TRUE,
  x = NULL,
  isexcel = FALSE,
  encoding = "unknown",
  verbose = TRUE,
  widths = NULL,
  col.names = NULL
)
read.any(
  filename = NA,
  folder = NA,
  sheet = 1,
  file_type = "",
  first_column_name = NA,
  header = is.null(widths),
  headers_on_row = NA,
  nrows = -1L,
  row.names.column = NA,
  row.names.remove = TRUE,
  make.names = FALSE,
  field_name_map = NA,
  require_columns = NA,
  all_chars = FALSE,
  auto_convert_dates = TRUE,
  allow_times = FALSE,
  check_numbers = TRUE,
  nazero = FALSE,
  check_logical = TRUE,
  stringsAsFactors = FALSE,
  na_strings = easyr::nastrings,
  na_level = "(Missing)",
  ignore_rows_with_na_at = NA,
  drop.na.cols = TRUE,
  drop.na.rows = TRUE,
  fix.dup.column.names = TRUE,
  do.trim.sheetname = TRUE,
  x = NULL,
  isexcel = FALSE,
  encoding = "unknown",
  verbose = TRUE,
  widths = NULL,
  col.names = NULL
)

Arguments

`filename`	File path and name for the file to be read in.
`folder`	Folder path to look for the file in.
`sheet`	The sheet to read in.
`file_type`	Specify the file type (CSV, TSV, DBF, FWF). If not provided, R will use the file extension to determine the file type. Useful when the file extension doesn't indicate the file type, like .rpt, etc.
`first_column_name`	Define headers location by providing the name of the left-most column. Alternatively, you can choose the row via the [headers_on_row] argument.
`header`	Choose if your file contains headers.
`headers_on_row`	Choose a specific row number to use as headers. Use this when you want to tell read.any exactly where the headers are.
`nrows`	Number of rows to read. Leave blank/NA to read all rows. This only speeds up file reads (CSV, XLSX, etc.), not compressed data that must be read all at once. This is applied BEFORe headers_on_row or first_column_name removes top rows, so it should be greater than those values if headers aren't in the first row.
`row.names.column`	Specify the column (by character name) to use for row names. This drops the columns and lets rows be referenced directly with this id. Must be unique values.
`row.names.remove`	If you move a column to row names, it is removed from the data by default. If you'd like to keep it, set this to FALSE.
`make.names`	Apply make.names function to make column names R-friendly (replaces non-characters with ., starting numbers with x, etc.)
`field_name_map`	Rename fields for consistency. Provide as a named vector where the names are the file's names and the vector values are the output names desired. See examples for how to create this input.
`require_columns`	List of required columns to check for. Calls stop() with helpful message if any aren't found.
`all_chars`	Keep all column types as characters. This makes using bind_rows easer, then you can use atype() later to set types.
`auto_convert_dates`	Identify date fields and automatically convert them to dates
`allow_times`	imes are not allowed in reading data in to facilitate easy binding. If you need times though, set this to TRUE.
`check_numbers`	Identfy numbers formatted as characters and convert them as such.
`nazero`	Convert NAs in numeric columns to 0.
`check_logical`	Identfy logical columns formatted as characters (Yes/No, etc) or numbers (0,1) and convert them as such.
`stringsAsFactors`	Convert characters to factors to increase processing speed and reduce file size.
`na_strings`	Strings to treat like NA. By default we use the easyr NA strings.
`na_level`	dplyr doesn't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip.
`ignore_rows_with_na_at`	Vector or value, numeric or character, identifying column(s) that require a value. read.any will remove these rows after colname swaps and read, before type conversion. Especially helpful for removing things like page numbers at the bottom of an excel report that break type discovery. Suggest using the claim number column here.
`drop.na.cols`	Drop columns with only NA values.
`drop.na.rows`	Drop rows with only NA values.
`fix.dup.column.names`	Adds 'DUPLICATE #' to duplicated column names to avoid issues with multiple columns having the same name.
`do.trim.sheetname`	read.any will trim sheet names to get better matches. This will cause an error if the actual sheet name has spaces on the left or right side. Disable this functionality here.
`x`	If you want to use read.any functionality on an existing data frame, pass it with this argument.
`isexcel`	If you want to use read.any functionality on an existing data frame, you can tell read.any that this data came from excel using isexcel manually. This comes in handy when excel-integer date conversions are necessary.
`encoding`	Encoding passed to fread and read.csv.
`verbose`	Print helpful information via cat.
`widths`	Column widths. Only use for fixed width files.
`col.names`	Column names. Only use for fixed width files.

Value

Data frame with the data that was read.

Examples


folder = system.file('extdata', package = 'easyr')
read.any('date-time.csv', folder = folder)

# if dates are being converted incorrectly, disable date conversion:
read.any('date-time.csv', folder = folder, auto_convert_dates = FALSE)

# to handle type conversions manually:
read.any('date-time.csv', folder = folder, all_chars = TRUE)

folder = system.file('extdata', package = 'easyr')
read.any('date-time.csv', folder = folder)

# if dates are being converted incorrectly, disable date conversion:
read.any('date-time.csv', folder = folder, auto_convert_dates = FALSE)

# to handle type conversions manually:
read.any('date-time.csv', folder = folder, all_chars = TRUE)

Read File as Text

Description

Read File as Text

Usage

read.txt(filename, folder = NA)
read.txt(filename, folder = NA)

Arguments

`filename`	File path and name for the file to be read in.
`folder`	Folder path to look for the file in.

Value

Character variable containing the text in the file.

Examples


# write a files.
path = tempfile()
cat( "some text", file = path )

# read the file.
read.txt( path )

# cleanum.
file.remove( path )

# write a files.
path = tempfile()
cat( "some text", file = path )

# read the file.
read.txt( path )

# cleanum.
file.remove( path )

right

Description

Behaves like Excel's LEFT, RIGHT, and MID functions Author: Dave. Tech review: Bryce Chamberlain.

Usage

right(string, char)
right(string, char)

Arguments

`string`	String to process.
`char`	Number of characters.

Examples

right( "leftmidright",5 )
right( "leftmidright",5 )

Run Folder

Description

Run all the R scripts in a folder. Author: Bryce Chamberlain.

Usage

runfolder(
  path,
  recursive = FALSE,
  is.local = TRUE,
  check.fn = NULL,
  run.files = NULL,
  verbose = TRUE,
  edit.on.err = TRUE,
  pattern = "[.][Rr]$"
)
runfolder(
  path,
  recursive = FALSE,
  is.local = TRUE,
  check.fn = NULL,
  run.files = NULL,
  verbose = TRUE,
  edit.on.err = TRUE,
  pattern = "[.][Rr]$"
)

Arguments

`path`	Folder to run.
`recursive`	Run all folder children also.
`is.local`	Code is running on a local machine, not a Shiny server. Helpful for skipping items that can be problematic on the server. In this case, printing to the log.
`check.fn`	Function to run after reach file is read-in.
`run.files`	Optionally pass the list of files to run. Otherwise, list.files will be run on the folder.
`verbose`	Print names of files and run-time via cat.
`edit.on.err`	Open the running file if an error occurs.
`pattern`	Passed to list.files. Pattern to match/filter files.

Examples

# runfolder( 'R' )
# runfolder( 'R' )

Read Excel

Description

This gets a bit complex since many errors can occur when reading in excel files. We've done our best to handle common ones. Requires packages: openxlsx, readxl, XML (these are required by easyr). It should NOT be used directly (that's why it isn't exported), but will be called by function [read.any] as necessary, with the applicable defaults set by that function.

Usage

rx(filename, sheet, first_column_name, nrows, verbose)
rx(filename, sheet, first_column_name, nrows, verbose)

Arguments

`filename`	File path and name for the file to be read in.
`sheet`	The sheet to read in.
`first_column_name`	Pass a column name to help the function find the header row.
`nrows`	Number of rows to read in.
`verbose`	Print helpful messages via cat().

Value

Data object

Save Cache Saves the arguments to a cache file, using the cache.num last checked with cache.ok.

Description

Save Cache

Saves the arguments to a cache file, using the cache.num last checked with cache.ok.

Usage

save.cache(...)
save.cache(...)

Arguments

...

Objects to save.

Examples

# check the first cache to see if it exists and dependent files haven't changed.
# if this check is TRUE, code in brackets will get skipped and the cache will be loaded instead.
# set do.load = FALSE if you have multiple files that build a cache, 
#    to prevent multiple cache loads.
# output will be printed to the console to tell you if the cache was loaded or re-built.
## Not run: 
  if( ! cache.ok(1) ){

    # do stuff
  
    # if this is the final file for this cache, 
    #   end with save.cache to save passed objects as a cache.
    save.cache(iris)

  }


## End(Not run)
# check the first cache to see if it exists and dependent files haven't changed.
# if this check is TRUE, code in brackets will get skipped and the cache will be loaded instead.
# set do.load = FALSE if you have multiple files that build a cache, 
#    to prevent multiple cache loads.
# output will be printed to the console to tell you if the cache was loaded or re-built.
## Not run: 
  if( ! cache.ok(1) ){

    # do stuff
  
    # if this is the final file for this cache, 
    #   end with save.cache to save passed objects as a cache.
    save.cache(iris)

  }


## End(Not run)

Search a Data Frame.

Description

Searches all columns for a term and returns all rows with at least one match. Author: Bryce Chamberlain.

Usage

sch(
  x,
  pattern,
  ignore.case = FALSE,
  fixed = FALSE,
  pluscols = NULL,
  exact = FALSE,
  trim = TRUE,
  spln = NULL
)
sch(
  x,
  pattern,
  ignore.case = FALSE,
  fixed = FALSE,
  pluscols = NULL,
  exact = FALSE,
  trim = TRUE,
  spln = NULL
)

Arguments

`x`	Data to search.
`pattern`	Regex patter to search. Most normal search terms will work fine, too.
`ignore.case`	Ignore case in search (uses grepl).
`fixed`	Passed to grepl to match string as-is instead of using regex. See ?grepl.
`pluscols`	choose columns to return in addition to those where matches are found. Can be a name, number or 'all' to bring back all columns.
`exact`	Find exact matches intead of pattern matching.
`trim`	Use trimws to trim columns before exact matching.
`spln`	Sample data use easyr::spl() before searching. This will speed up searching in large datasets when you only need to identify columns, not all data that matches. See ?spl n argument for more info.

Value

Matching rows.

Examples

sch( iris, 'seto' )
sch( iris, 'seto', pluscols='all' )
sch( iris, 'seto', pluscols='Sepal.Width' )
sch( iris, 'seto', exact = TRUE ) # message no matches and return NULL
sch( iris, 'seto' )
sch( iris, 'seto', pluscols='all' )
sch( iris, 'seto', pluscols='Sepal.Width' )
sch( iris, 'seto', exact = TRUE ) # message no matches and return NULL

Text Similarity Search

Description

Search for similar strings using in a vector.

Usage

similar_text(
  search,
  context,
  algo = "jaccard",
  level = 0.5,
  return_similarity = FALSE
)
similar_text(
  search,
  context,
  algo = "jaccard",
  level = 0.5,
  return_similarity = FALSE
)

Arguments

`search`	Single character/string to search for.
`context`	Vector of characters to search within.
`algo`	Algorithm to use when determining similarity. Currenly, only Jaccard Similarity is implemented.
`level`	Returned characters will be this similar or more similar. Higher values will return fewer/closer matches.
`return_similarity`	Special option for diagnosing. TRUE will ignore [level] and return a named vector where the name is the context value and the value is the similarity.

Value

Characters that meet the similarity requirement.

Examples

similar_text('foobar', c('foo', 'bar', 'foobars'))
similar_text('foobar', c('foo', 'bar', 'foobars'), return_similarity = TRUE)
similar_text('foobar', c('foo', 'bar', 'foobars'))
similar_text('foobar', c('foo', 'bar', 'foobars'), return_similarity = TRUE)

Sample

Description

Extracts a uniform random sample from a dataset or vector. Provides a simpler API than base R. Author: Bryce Chamberlain. Tech Review: Maria Gonzalez.

Usage

spl(x, n = 10, warn = TRUE, replace = FALSE, seed = NULL, ...)
spl(x, n = 10, warn = TRUE, replace = FALSE, seed = NULL, ...)

Arguments

`x`	Data to sample from.
`n`	Number or percentage of rows/values to return. If less than 1 it will be interpreted as a percentage.
`warn`	Warn if sampling more than the size of the data.
`replace`	Whether or not to sample with replacement.
`seed`	Set a seed to allow consistent/replicable sampling.
`...`	Other parameters passed to sample()

Value

Sample dataframe/vector.

Examples

spl( c(1:100) )
spl( c(1:100), n = 50 )
spl( iris )
spl( c(1:100) )
spl( c(1:100), n = 50 )
spl( iris )

states

Description

Helpful info for states. Right now, just a mapping of abbreviations to names.

Usage

states
states

Format

Data frame.

Structure with Like

Description

Runs str function but only for names matching a character value (regex). Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

strx(df, char, ignore.case = T)
strx(df, char, ignore.case = T)

Arguments

`df`	Object with names you'd like to search.
`char`	Regex (character value) to match.
`ignore.case`	(Optional) Ignore case when matching.

Examples

strx(iris,'length')
strx(iris,'length')

Summarize All Numeric Columns

Description

Easily summarize at all numeric variables. Helpful for flexibly summarizing without knowing the columns. Defaults to sum but you can send a custom function through also. Typically pass in a data frame after group_by.

Usage

sumnum(x, do.fun = NULL, except = c(), do.ungroup = TRUE, ...)
sumnum(x, do.fun = NULL, except = c(), do.ungroup = TRUE, ...)

Arguments

`x`	Grouped tibble to summarize.
`do.fun`	Function to use for the summary. Passed to dplyr::summarize(). Can be a custom function. Defaults to sum().
`except`	Columns names, numbers, or a logical vector indicating columns NOT to summarize.
`do.ungroup`	Run dplyr::ungroup() after summarizing the prevent future issues with grouping.
`...`	Extra args passed to dplyr::summarize() which are applied as arguments to the function passed in do.fun.

Value

Summarized data frame or tibble.

Examples


require(dplyr)
require(easyr)

sumnum( group_by( cars, speed ) )
sumnum( group_by( cars, speed ), mean )
sumnum( cars )

require(dplyr)
require(easyr)

sumnum( group_by( cars, speed ) )
sumnum( group_by( cars, speed ), mean )
sumnum( cars )

tryCatch with Message

Description

Easy Try/Catch implementation to return the same message on error or warning. Makes it easier to write tryCatches. Author: Bryce Chamberlain. Tech review: Lindsay Smelzter.

Usage

tcmsg(code_block, ...)
tcmsg(code_block, ...)

Arguments

`code_block`	Code to run in Try Catch.
`...`	Strings to concatenate to form the message that is returned.

Examples

tryCatch({ 
   tcmsg({ NULL = 1 }, 'Cannot assign to NULL','variable' ) 
 }, 
 error = function(e) print( e ) 
 )

tryCatch({ 
   tcmsg({ as.numeric('abc') },'Issue in as.numeric()') 
  }, 
  warning = function(e) print( e ) 
)
tryCatch({ 
   tcmsg({ NULL = 1 }, 'Cannot assign to NULL','variable' ) 
 }, 
 error = function(e) print( e ) 
 )

tryCatch({ 
   tcmsg({ as.numeric('abc') },'Issue in as.numeric()') 
  }, 
  warning = function(e) print( e ) 
)

Transpose at Column.

Description

Transpose operation that sets column names equal to a column in the original data. Author: Bryce Chamberlain.

Usage

tcol(x, header, cols.colname = "col", do.atype = TRUE)
tcol(x, header, cols.colname = "col", do.atype = TRUE)

Arguments

`x`	Data frame to be transposed.
`header`	Column name/number to be used as column names of transposed data.
`cols.colname`	Name to use for the column of column names in the transposed data.
`do.atype`	Transpose convertes to strings, since data types are uncertain. Run atype to automatically correct variable typing where possible. This will slow the result a bit.

Value

Transposed data frame.

Examples

 # create a summary dataset from iris.
 x = dplyr::summarize_at( 
  dplyr::group_by( iris, Species ), 
  dplyr::vars( Sepal.Length, Sepal.Width ), list(sum) 
 )
 # run tcol
 tcol( x, 'Species' )
# create a summary dataset from iris.
 x = dplyr::summarize_at( 
  dplyr::group_by( iris, Species ), 
  dplyr::vars( Sepal.Length, Sepal.Width ), list(sum) 
 )
 # run tcol
 tcol( x, 'Species' )

tryCatch with warning

Description

Easy Try/Catch implementation to return the same message as a warning on error or warning. Makes it easier to write tryCatches. Author: Bryce Chamberlain. Tech review: Lindsay Smelzter.

Usage

tcwarn(code_block, ...)
tcwarn(code_block, ...)

Arguments

`code_block`	Code to run in Try Catch.
`...`	Strings to concatenate to form the message that is returned.

Examples

tryCatch({
   tcwarn({ NULL = 1 },'Cannot assign to NULL','variable') 
 }, 
 warning = function(e) print( e ) 
)

tryCatch({ 
   tcwarn({ as.numeric('abc') },'Issue in as.numeric()') 
 }, 
 warning = function(e) print( e )
)
tryCatch({
   tcwarn({ NULL = 1 },'Cannot assign to NULL','variable') 
 }, 
 warning = function(e) print( e ) 
)

tryCatch({ 
   tcwarn({ as.numeric('abc') },'Issue in as.numeric()') 
 }, 
 warning = function(e) print( e )
)

Convert to Logical/Boolean

Description

Flexible boolean conversion. Author: Bryce Chamberlain.

Usage

tobool(
  x,
  preprocessed.values = NULL,
  nastrings = easyr::nastrings,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  true.vals = c("true", "1", "t", "yes"),
  false.vals = c("false", "0", "f", "no")
)
tobool(
  x,
  preprocessed.values = NULL,
  nastrings = easyr::nastrings,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  true.vals = c("true", "1", "t", "yes"),
  false.vals = c("false", "0", "f", "no")
)

Arguments

`x`	Value or vector to be converted.
`preprocessed.values`	Strings need to have NAs set, lowercase and be trimmed before they can be checked. To avoid doing this multiple times, you can pass these processed values to the function.
`nastrings`	Vector of characters to be considered NAs. todate will treat these like NAs. Defaults to the easyr::nastrings list.
`ifna`	Action to take if NAs are created. 'return-unchanged' returns the sent vector unchanged; 'warning' results in a warning and returns the converted vector with new NAs; 'error' results in an error.
`verbose`	Choose to view messaging.
`true.vals`	Values to consider as TRUE.
`false.vals`	Values to consider as FALSE.

Value

Converted logical vector.

Examples

tobool( c( 'true', 'FALSE', 0, 1, NA, 'yes', 'NO' ) )
tobool( c( 'true', 'FALSE', 0, 1, NA, 'yes', 'NO' ) )

Shorthand for as.character

Description

Shorthand for as.character

Usage

tochar(x)
tochar(x)

Arguments

`x`	Value to check.

Value

as.character result

Examples

tochar(NA)
tochar(1)
tochar(NA)
tochar(1)

Convert to Date

Description

Flexible date conversion function using lubridate. Works with dates in many formats, without needing to know the format in advance. Only use this if you don't know the format of the dates before hand. Otherwise, lubridate functions parse_date_time, mdy, etc. should be used. Author: Bryce Chamberlain. Tech review: Dominic Dillingham.

Usage

todate(
  x,
  nastrings = easyr::nastrings,
  aggressive.extraction = TRUE,
  preprocessed.values = NULL,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  allow_times = FALSE,
  do.month.char = TRUE,
  do.excel = TRUE,
  min.acceptable = lubridate::ymd("1920-01-01"),
  max.acceptable = lubridate::ymd("2050-01-01")
)
todate(
  x,
  nastrings = easyr::nastrings,
  aggressive.extraction = TRUE,
  preprocessed.values = NULL,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  allow_times = FALSE,
  do.month.char = TRUE,
  do.excel = TRUE,
  min.acceptable = lubridate::ymd("1920-01-01"),
  max.acceptable = lubridate::ymd("2050-01-01")
)

Arguments

`x`	Value or vector to be converted.
`nastrings`	Vector of characters to be considered NAs. todate will treat these like NAs. Defaults to the easyr::nastrings list.
`aggressive.extraction`	todate will take dates inside long strings (like filenames) and convert them to dates. This seems to be the preferred outcome, so we leave it as default (TRUE). However, if you want to avoid this you can do so via this option (FALSE).
`preprocessed.values`	Strings need to have NAs set, lowercase and be trimmed before they can be checked. To avoid doing this multiple times, you can pass these processed values to the function.
`ifna`	Action to take if NAs are created. 'return-unchanged' returns the sent vector unchanged; 'warning' results in a warning and returns the converted vector with new NAs; 'error' results in an error; 'return-na' returns new NAs without a warning.
`verbose`	Choose to view messaging.
`allow_times`	Set to TRUE to allow DateTimes as output, otherwise this will always convert to Dates (losing time information). This is better for binding data, hence the default FALSE.
`do.month.char`	Attempt to convert month names in text. lubridate does this by default, but sometimes it can result in inaccurate dates. For example, "Feb 2017" is converted to 2-20-2017 even though no day was given.
`do.excel`	Check for excel-formatted numbers.
`min.acceptable`	Set NA if converted value is less than this value. Helps to prevent numbers from being assumed as dates. Set NULL to skip this check. Does not affect character conversions.
`max.acceptable`	Set NA if converted value is greater than this value. Helps to prevent numbers from being assumed as dates. Set NULL to skip this check. Does not affect character conversions.

Value

Converted vector using lubridate::parse_date_time(x,c('mdy','ymd','dmy'))

Examples

x <- c( '20171124', '2017/12/24', NA, '12/24/2017', '5/11/2017 1:51PM' ) 
x2 <- todate(x)
x2
x <- c( '20171124', '2017/12/24', NA, '12/24/2017', '5/11/2017 1:51PM' ) 
x2 <- todate(x)
x2

Convert to Number

Description

Flexible number conversion for converting strings to numbers. Handles $ , ' and spaces. Author: Bryce Chamberlain. Tech review: Dominic Dillingham.

Usage

tonum(
  x,
  preprocessed.values = NULL,
  nastrings = easyr::nastrings,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  nazero = FALSE,
  checkdate = TRUE,
  remove.chars = FALSE,
  do.logical = TRUE,
  do.try.integer = TRUE,
  multipliers = c(`%` = 1/100, K = 1000, M = 1000^2, B = 1000^3)
)
tonum(
  x,
  preprocessed.values = NULL,
  nastrings = easyr::nastrings,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  nazero = FALSE,
  checkdate = TRUE,
  remove.chars = FALSE,
  do.logical = TRUE,
  do.try.integer = TRUE,
  multipliers = c(`%` = 1/100, K = 1000, M = 1000^2, B = 1000^3)
)

Arguments

`x`	Vector to convert.
`preprocessed.values`	Strings need to have NAs set, lowercase and be trimmed before they can be checked. To avoid doing this multiple times, you can pass these processed values to the function.
`nastrings`	Vector of characters to be considered NAs. todate will treat these like NAs. Defaults to the easyr::nastrings list.
`ifna`	Action to take if NAs are created. 'return-unchanged' returns the sent vector unchanged; 'warning' results in a warning and returns the converted vector with new NAs; 'error' results in an error; return-na returns data with new NAs and prints via cat if verbose.
`verbose`	Choose to view messaging.
`nazero`	(Optional) Convert NAs to 0. Defaults to TRUE, if FALSE NAs will stay NA.
`checkdate`	Check if the column is a date first. If this has already been done, set this to FALSE so it doesn't run again.
`remove.chars`	Remove characters for aggressive conversion to numbers.
`do.logical`	Check for logical-form vectors.
`do.try.integer`	Return an integer if possible. Integers are a more compact data type and should be used whenever possible.
`multipliers`	Named vector of factor symbols and values to check. Setting to NULL may speed up operations.

Value

Converted vector.

Examples

tonum( c('123','$50.02','30%','(300.01)',NA,'-','') )
tonum( c('123','$50.02','30%','(300.01)',NA,'-',''), nazero = FALSE )
tonum( c( '$(3,891)M', '4B', '3.41K', '30', '40K' ) )
tonum( c('123','$50.02','30%','(300.01)',NA,'-','') )
tonum( c('123','$50.02','30%','(300.01)',NA,'-',''), nazero = FALSE )
tonum( c( '$(3,891)M', '4B', '3.41K', '30', '40K' ) )

Use Package

Description

Installs a package if it needs to be installed, and calls require to load the package. Author: Scott Sobel. Tech Review: Bryce Chamberlain.

Usage

usepkg(packages, noCache = FALSE, repos = "http://cran.us.r-project.org")
usepkg(packages, noCache = FALSE, repos = "http://cran.us.r-project.org")

Arguments

`packages`	Character or character vector with names of the packages you want to use.
`noCache`	When checking packages, you can choose to ignore the cached list, which will increase accuracy but decrease speed.
`repos`	choose the URL to install from.

Examples


# packages shouldn't be installed during tests or examples according to CRAN. 
# therefore, examples cannot be provided because CRAN now runs donttest examples.

usepkg('geodist', FALSE, 'http://cran.us.r-project.org')

# packages shouldn't be installed during tests or examples according to CRAN. 
# therefore, examples cannot be provided because CRAN now runs donttest examples.

usepkg('geodist', FALSE, 'http://cran.us.r-project.org')

Validate Equal

Description

Check various properties of 2 data frames to ensure they are equivalent.

Usage

validate.equal(
  df1,
  df2,
  id.column = NULL,
  regex.remove = "[^A-z0-9.+\\/,-]",
  do.set.NA = TRUE,
  nastrings = easyr::nastrings,
  match.round.to.digits = 4,
  do.all.columns.before.err = FALSE,
  check.column.order = FALSE,
  sort.by.id = TRUE,
  acceptable.pct.rows.diff = 0,
  acceptable.pct.vals.diff = 0,
  return.summary = FALSE,
  verbose = TRUE
)
validate.equal(
  df1,
  df2,
  id.column = NULL,
  regex.remove = "[^A-z0-9.+\\/,-]",
  do.set.NA = TRUE,
  nastrings = easyr::nastrings,
  match.round.to.digits = 4,
  do.all.columns.before.err = FALSE,
  check.column.order = FALSE,
  sort.by.id = TRUE,
  acceptable.pct.rows.diff = 0,
  acceptable.pct.vals.diff = 0,
  return.summary = FALSE,
  verbose = TRUE
)

Arguments

`df1`	First data frame to compare.
`df2`	Second data frame to compare.
`id.column`	If available, a column to use as an ID. Helpful in various checks and output.
`regex.remove`	Pattern to remove from strings. Used in gsub to remove characters we don't want to consider when comparing values. Set to NULL, NA, or "" to leave strings unchanged.
`do.set.NA`	Remove NA strings.
`nastrings`	Strings to consider NA.
`match.round.to.digits`	Round numbers to these digits before checking equality.
`do.all.columns.before.err`	Check all columns before returning an error. Takes longer but returns more detail. If FALSE, stops at first column that doesn't match and returns mismatches.
`check.column.order`	Enforce same column order.
`sort.by.id`	Sort by the id column before making comparisons.
`acceptable.pct.rows.diff`	If you are OK with differences in a few rows, set this value. If fewer rows in a column don't match, the function will consider the columns equivalent. Iterpreted as a percentage (it gets divided by 100).
`acceptable.pct.vals.diff`	If you are OK with small differences in values, set this value. If the difference in numeric values is less, the function will consider the values equivalent. Iterpreted as a percentage (it gets divided by 100) and compared to absolute value of percentage difference.
`return.summary`	Return 2 items in a list, the row mismatches and a summary of row mismatches.
`verbose`	Print helpful information via cat().

Value

May return information about mismatches. Otherwise doesn't return anything (NULL).

Examples

validate.equal( iris, iris )

validate.equal( iris, iris )

Write

Description

Improved write function. Writes to csv without row names and automatically adds .csv to the file name if it isn't there already. Changes to .csv if another extension is passed. Easier to type than write.csv(row.names=F). Author: Bryce Chamberlain. Tech reveiw: Maria Gonzalez.

Usage

w(x, filename = "out", row.names = FALSE, na = "")
w(x, filename = "out", row.names = FALSE, na = "")

Arguments

`x`	Data frame to write to file.
`filename`	(Optional) Filename to use.
`row.names`	(Optional) Specify if you want to include row names/numbers in the output file.
`na`	(Optional) String to print for NAs. Defaults to an empty/blank string.

Examples

# write the cars dataset.
path = paste0( tempdir(), '/out.csv' )
w( cars, path )

# cleanup.
file.remove( path )
# write the cars dataset.
path = paste0( tempdir(), '/out.csv' )
w( cars, path )

# cleanup.
file.remove( path )

Convert Excel Number to Date

Description

Converts dates formatted as long integers from Excel to Date format in R, accounting for known Excel leap year errors. Author: Bryce Chamberlain. Tech review: Dominic Dillingham.

Usage

xldate(
  x,
  origin = "1899-12-30",
  nastrings = easyr::nastrings,
  preprocessed.values = NULL,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  allow_times = FALSE,
  do.month.char = TRUE,
  min.acceptable = lubridate::ymd("1920-01-01"),
  max.acceptable = lubridate::ymd("2050-01-01")
)
xldate(
  x,
  origin = "1899-12-30",
  nastrings = easyr::nastrings,
  preprocessed.values = NULL,
  ifna = c("return-unchanged", "error", "warning", "return-na"),
  verbose = TRUE,
  allow_times = FALSE,
  do.month.char = TRUE,
  min.acceptable = lubridate::ymd("1920-01-01"),
  max.acceptable = lubridate::ymd("2050-01-01")
)

Arguments

`x`	Vector of values.
`origin`	Zero value to use in date conversion. Older version of excel might use a different value.
`nastrings`	Vector of characters to be considered NAs. todate will treat these like NAs. Defaults to the easyr::nastrings list.
`preprocessed.values`	Strings need to have NAs set, lowercase and be trimmed before they can be checked. To avoid doing this twice, you can tell the function that it has already been done.
`ifna`	Action to take if NAs are created. 'return-unchanged' returns the sent vector unchanged; 'warning' results in a warning and returns the converted vector with new NAs; 'error' results in an error.
`verbose`	Choose to view messaging.
`allow_times`	Return values with time, not just the date.
`do.month.char`	Convert month character names like Feb, March, etc.
`min.acceptable`	Set NA if converted value is less than this value. Helps to prevent numbers from being assumed as dates. Set NULL to skip this check.
`max.acceptable`	Set NA if converted value is greater than this value. Helps to prevent numbers from being assumed as dates. Set NULL to skip this check.

Value

Vector of converted values.

Examples

xldate( c('7597', '42769', '47545', NA ) )
xldate( c('7597', '42769', '47545', NA ) )

Date Difference in Years

Description

Date Difference in Years

Usage

ydiff(x, y, do.date.convert = TRUE, do.numeric = TRUE)
ydiff(x, y, do.date.convert = TRUE, do.numeric = TRUE)

Arguments

`x`	Vector of starting dates or items that can be converted to dates by todate.
`y`	Vector of ending dates or items that can be converted to dates by todate.
`do.date.convert`	Convert to dates before running the difference. If you know your columns are already dates, setting to FALSE will make your code run faster.
`do.numeric`	Convert the output to a number instead of a date difference object.

Value

Vector of differences.

Examples

ydiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )
ydiff( lubridate::mdy( '1/1/2018' ), lubridate::mdy( '3/4/2018' ) )

Package 'easyr'

Help Index

Not-In

Description

Usage

Arguments

Value

Examples

As Text

Description

Usage

Arguments

Value

Examples

Auto-Type

Description

Usage

Arguments

Details

Value

Examples

Begin

Description

Usage

Arguments

Examples

Bin by Volume

Description

Usage

Arguments

Value

Examples

Bind Rows with Factors

Description

Usage

Arguments

Value

Examples

Capture Warning

Description

Usage

Arguments

Examples

Initialize cache.

Description

Usage

Arguments

Examples

Check Cache Status

Description

Usage

Arguments

Value

Examples

Save Cache (Alternate) Saves the arguments to a cache file, using the cache.num last checked with cache.ok. This function provides an alternative syntax more aligned with other functions that start with "cache.".

Description

Usage

Arguments

Examples

cblind

Description

Usage

Format

Concatenate.

Description

Usage

Arguments

Value

Examples

Characters to Factors

Description

Usage

Arguments

Value

Examples

Check for Number Formatted as Character.

Description

Usage

Arguments

Value