▲ | RaftPeople 3 days ago | |
> What my workself would love is to easily dump Pandas or Polar data frames to SQL Tables in SQL Server as fast as possible We run into this issue also where we want to upload volumes of data but don't want to assume access to BCP on every DB server. We wrote a little utility that actually works pretty fast compared to other methods we've tested (fast=about 1,000,000 rows per minute for a table with 10 random columns with random data), here's the approach: 1-Convert rows into fixed length strings so each row is uploaded as one single varchar column (which makes parsing+execution of SQL stmt during upload much quicker) 2-Repeatedly upload groups of fixed length rows into temp table until all uploaded. Details: Multiple fixed length rows are combined into one fixed length varchar column that will be uploaded as one single raw buffer row. We found a buffer size of 15,000 to be the sweet spot. Multiple threads will each process a subset of source data rows. We found 5 threads to be generally pretty good. At the end of this step, the destination temp table will have X rows of buffers (the buffer column is just a varchar(15000), and inside each of those buffers are Y source data rows with Z number of columns in fixed format. 3-Once the buffer rows are all uploaded then split out the source data rows+columns using a temp sproc generated for the exact schema (e.g. substring(Buffer_Data,x,y) as Cust_Name) | ||
▲ | ludamn 3 days ago | parent [-] | |
[dead] |