Based on DuckDB v1.1.2
Query Execution Flow
- Parser
- Planner
- Entry point:
Planner::CreatePlan
- Entry point:
- Binder
- Entry point:
Binder::Bind
- Entry point:
- Executor
- Entry point:
Executor::ExecuteTask
- Entry point:
Copy
PipelineExecutor::Execute
-> PipelineExecutor::PushFinalize
-> PhysicalCopyToFile::Combine
-> (custom) CopyFunction::copy_to_combine
Join
Bulk load
If the transaction is rolled back or aborted, the blocks that were pre-emptively written to disk are marked as unused and reclaimed by the system for use in subsequent writes. This might still cause the database file to grow temporarily, however, and may create gaps in the database file if there are multiple transactions writing at the same time with a subset of those transactions aborting. That space is not lost - however. It will be re-used by the system when new data is ingested.
Internal data format
Use Extension
Read extension README.md first.
Install
- Extensions are downloaded from “extensions.duckdb.org” by default.
- Custom extension repository can be configured by
custom_extension_repository
. - Distribute your extension
- Custom extension repository can be configured by
- Extensions are installed on
${DBConfigOptions.extension_directory or $HOME}.duckdb/extensions/${version_dir}/${platform}
related code: src/main/extension/extension_install.cpp
Load
- Load procedure
- call
{extension_name}_version
to check extension version and compare with running DuckDB version - call
{extension_name}_init
- call
related code: src/main/extension/extension_load.cpp
Extension implementation
- Implement
{extension_name}_init
and{extension_name}_version
- Implement
Extension
class and callDuckDB::LoadExtension()
- Load is skipped on the second time.
related code: src/include/duckdb/main/extension.hpp
Extension types
- Function types are defined on
src/include/duckdb/function
- Use
ExtensionUtil::RegisterFunction
to register function- It creates an object of
CreateFunctionInfo
- It creates an object of
Table function
related code: src/include/duckdb/function/table_function.hpp
- Create an instance of
TableFunction
- required fields are
function
andbind
bind
: parse options and returnFunctionData
that stores parameters required to process scan- caller:
Binder::BindTableFunctionInternal
- fill
return_types
andnames
- caller:
init_global
- caller: constructor of
TableScanGlobalSourceState
(<-Executor::Initialize
) - override
MaxThreads
for multi threading
- caller: constructor of
init_local
- caller: constructor of
TableScanLocalSourceState
(<-PipelineTask::ExecuteTask
<-Executor::ExecuteTask
)
- caller: constructor of
function
: fillDataChunk
and return until scan completes- caller:
PhysicalTableScan::GetData
(<-PipelineTask::ExecuteTask
<-Executor::ExecuteTask
)- finish if
chunk.size() == 0
- finish if
- caller:
Reading multiple files
MultiFileReader::ParseOptions
: parse options for multi file reader- typically used in
TableFunction::bind
- typically used in
MultiFileReader::FinalizeBind
- typically used in
TableFunction::init_global
- typically used in
MultiFileReader::FinalizeChunk
- typically used in
TableFunction::function
- typically used in
Copy function
related code: src/include/duckdb/function/copy_function.hpp
Tips
- Use unsigned extension from CLI
duckdb -unsigned
- Extension build type
loadable extension binaries can be built two ways:
- EXTENSION_STATIC_BUILD=1 DuckDB is statically linked into each extension binary. This increases portability because in several situations DuckDB itself may have been loaded with RTLD_LOCAL. This is currently the main way we distribute the loadable extension binaries
- EXTENSION_STATIC_BUILD=0 The DuckDB symbols required by the loadable extensions are left unresolved. This will reduce the size of the binaries and works well when running the DuckDB cli directly. For windows this uses delay loading. For MacOS and linux the dynamic loader will look up the missing symbols when the extension is dlopen-ed.