This release includes :
@inheritParams
to simply documentation of
functions arguments #38. This leads to some renaming of arguments (e.g
path_to_csv
-> path_to_file
…)compression
and
compression_level
are now passed to write_parquet_at_once
and write_parquet_by_chunk functions and now available in main
conversion functions of parquetize
#36@importFrom
in a file to facilitate their
maintenance #37This release includes :
You can convert to parquet any query you want on any DBI compatible RDBMS :
dbi_connection <- DBI::dbConnect(RSQLite::SQLite(),
system.file("extdata","iris.sqlite",package = "parquetize"))
# Reading iris table from local sqlite database
# and conversion to one parquet file :
dbi_to_parquet(
conn = dbi_connection,
sql_query = "SELECT * FROM iris",
path_to_parquet = tempdir(),
parquetname = "iris"
)
You can find more information on dbi_to_parquet
documentation.
Two arguments are deprecated to avoid confusion with arrow concept and keep consistency
chunk_size
is replaced by max_rows
(chunk
size is an arrow concept).chunk_memory_size
is replaced by
max_memory
for consistencyThis release includes :
parquetize
!Due to these numerous contributions, @nbc is now officially part of the project authors !
After a big refactoring, three arguments are deprecated :
by_chunk
: table_to_parquet
will
automatically chunked if you use one of chunk_memory_size
or chunk_size
.csv_as_a_zip
: csv_to_table
will detect if
file is a zip by the extension.url_to_csv
: use path_to_csv
instead,
csv_to_table
will detect if the file is remote with the
file path.They will raise a deprecation warning for the moment.
The possibility to chunk parquet by memory size with
table_to_parquet()
: table_to_parquet()
takes a
chunk_memory_size
argument to convert an input file into
parquet file of roughly chunk_memory_size
Mb size when data
are loaded in memory.
Argument by_chunk
is deprecated (see above).
Example of use of the argument chunk_memory_size
:
table_to_parquet(
path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),
path_to_parquet = tempdir(),
chunk_memory_size = 5000, # this will create files of around 5Gb when loaded in memory
)
write_parquet
when
chunkingThe functionality for users to pass argument to
write_parquet()
when chunking argument (in the ellipsis).
Can be used for example to pass compression
and
compression_level
.
Example:
table_to_parquet(
path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),
path_to_parquet = tempdir(),
compression = "zstd",
compression_level = 10,
chunk_memory_size = 5000
)
download_extract
This function is added to … download and unzip file if needed.
file_path <- download_extract(
"https://www.nomisweb.co.uk/output/census/2021/census2021-ts007.zip",
filename_in_zip = "census2021-ts007-ctry.csv"
)
csv_to_parquet(
file_path,
path_to_parquet = tempdir()
)
Under the cover, this release has hardened tests
This release fix an error when converting a sas file by chunk.
This release includes :
table_to_parquet()
and
csv_to_parquet()
functions #20inst/extdata
directory.This release includes :
table_to_parquet()
function has been
fixed when the argument by_chunk
is TRUE.This release removes duckdb_to_parquet()
function on the
advice of Brian Ripley from CRAN.
Indeed, the storage of DuckDB is not yet stable. The storage will be
stabilized when version 1.0 releases.
This release includes corrections for CRAN submission.
This release includes an important feature :
The table_to_parquet()
function can now convert tables
to parquet format with less memory consumption. Useful for huge tables
and for computers with little RAM. (#15) A vignette has been written
about it. See here.
nb_rows
argument in the
table_to_parquet()
functionby_chunk
,
chunk_size
and skip
(see documentation)duckdb_to_parquet()
function to convert duckdb
files to parquet format.sqlite_to_parquet()
function to convert sqlite
files to parquet format.rds_to_parquet()
function to convert rds files to
parquet format.json_to_parquet()
function to convert json and
ndjson files to parquet format.path_to_parquet
exists in functions
csv_to_parquet()
or table_to_parquet()
(@py-b)table_to_parquet()
function to convert SAS, SPSS
and Stata files to parquet format.csv_to_parquet()
function to convert csv files to
parquet format.parquetize_example()
function to get path to
package data examples.NEWS.md
file to track changes to the
package.