File Coverage

blib/lib/SQL/Inserter.pm

Criterion	Covered	Total	%
statement	153	153	100.0
branch	52	52	100.0
condition	27	27	100.0
subroutine	20	20	100.0
pod	4	4	100.0
total	256	256	100.0

line

stmt

bran

cond

sub

pod

time

code

1

package SQL::Inserter;

2

3

5

1047510

use 5.008;

5

52

4

5

55

use strict;

5

9

5

121

5

5

25

use warnings;

5

8

5

151

6

7

5

28

use Carp;

5

9

5

320

8

5

31

use Exporter 'import';

5

19

5

9358

9

10

=head1 NAME

11

12

SQL::Inserter - Efficient buffered DBI inserter and fast INSERT SQL builder

13

14

=head1 VERSION

15

16

Version 0.02

17

18

=cut

19

20

our $VERSION = '0.02';

21

22

our @EXPORT_OK = qw(simple_insert multi_insert_sql);

23

24

=head1 SYNOPSIS

25

26

use SQL::Inserter;

27

28

my $sql = SQL::Inserter->new(

29

dbh => $dbh,

30

table => 'table',

31

cols => [qw/col1 col2.../],

32

buffer => 100? # Default buffer is 100 rows

33

);

34

35

# Fastest method: pass single or multiple rows of data as an array

36

$sql->insert($col1_row1, $col2_row1, $col1_row2...);

37

38

# You can manually flush the buffer at any time with no argument on insert

39

# (otherwise there is auto-flush on the object's destruction)

40

$sql->insert();

41

42

# Alternative, pass a single row as a hash, allows SQL code passed as

43

# references instead of values (no need to define cols in constructor)

44

$sql->insert({

45

column1 => $data1,

46

column2 => \'NOW()',

47

...

48

});

49

50

# There are also functions to just get the SQL statement and its bind vars

51

# similar to SQL::Abstract or SQL::Maker insert, but with much less overhead:

52

my ($sql, @bind) = simple_insert($table, {col1=>$val...});

53

54

# Multi-row variant:

55

my ($sql, @bind) = simple_insert($table, [{col1=>$val1...},{col1=>$val2...},...]);

56

57

# Or, construct an SQL statement with placeholders for a given number of rows:

58

my $sql = multi_insert_sql('table', [qw/col1 col2.../], $num_of_rows);

59

60

=head1 DESCRIPTION

61

62

SQL::Inserter's main lightweight OO interface will let you do L inserts as

63

efficiently as possible by managing a multi-row buffer and prepared statements.

64

65

You only have to select the number of rows for the buffered writes (default is 100)

66

and choose whether to pass your data in arrays (fastest, requires all data to be bind

67

values, will execute the same prepared statement every time the buffer is full) or

68

hashes (allows SQL code apart from plain values).

69

70

It also provides lightweight functions that return the SQL queries to be used manually,

71

similar to C, but much faster.

72

73

C and C variants supported for MySQL/MariaDB.

74

75

=head1 EXPORTS

76

77

On request: C C.

78

79

=head1 CONSTRUCTOR

80

81

=head2 C

82

83

my $sql = SQL::Inserter->new(

84

dbh => $dbh,

85

table => $table,

86

cols => \@column_names?,

87

buffer => 100?,

88

duplicates => $ignore_or_update?,

89

null_undef => $convert_undef_to_NULL?

90

);

91

92

Creates an object to insert data to a specific table. Buffering is enabled by default

93

and anything left on it will be written when the object falls out of scope / is destroyed.

94

95

Required parameters:

96

97

=over 4

98

99

=item * C : A L database handle.

=item * C : The name of the db table to insert to. See L if you

are using a restricted word for a table name.

=back

Optional parameters:

=over 4

=item * C : The names of the columns to insert. It is required if arrays are

used to pass the data. With hashes they are optional (the order will be followed

if they are defined). See L if you are using any restricted words for

column names.

=item * C : Max number of rows to be held in buffer before there is a write.

The buffer flushes (writes contents) when the object is destroyed. Setting it to 1

writes each row separately (least efficient). For small rows you can set buffer to

thousands. The default is a (conservative) 100 which works with big data rows.

=item * C : For MySQL, define as C<'ignore'> or C<'update'> to get an

C or C query respectively. See L

for details on the latter.

=item * C : Applies to the hash inserts only. If true, any undefined

values will be converted to SQL's C (similar to the C default).

The default behaviour will leave an undef as the bind variable, which may either

create an empty string in the db or give an error depending on your column type and

db settings.

=back

=cut

sub new {

13

36451

my $class = shift;

13

46

my %args = @_;

13

22

my $self = {};

13

25

bless($self, $class);

13

265

$self->{dbh} = $args{dbh} || croak("dbh parameter (db handle) required.");

12

131

$self->{table} = $args{table} || croak("table parameter required.");

11

26

$self->{cols} = $args{cols};

11

40

$self->{buffer} = $args{buffer} || 100;

11

18

$self->{dupes} = $args{duplicates};

11

21

$self->{null} = $args{null_undef};

11

36

if ($self->{dupes}) {

2

6

$self->{ignore} = 1 if $self->{dupes} eq "ignore";

2

7

$self->{update} = 1 if $self->{dupes} eq "update";

}

11

41

$self->_cleanup();

11

49

return $self;

}

=head1 METHODS

=head2 insert

# Fastest array method. Only bind data is passed.

my $ret = $sql->insert(@column_data_array);

# Alternative, allows SQL code as values in addition to bind variables

my $ret = $sql->insert(\%row_data);

# No parameters will force emtying of buffer (db write)

my $ret = $sql->insert();

The main insert method. Returns the return value of the last C statement

if there was one called, 0 otherwise (buffer not full.

It works in two main modes, by passing an array or a hashref:

=over 4

=item Array mode

Pass the data for one or more rows in a flat array, buffering will work automatically

based on your C settings. Obviously your C<@column_data_array> has to contain

a multiple of the number of C defined on the constructor.

This is the fastest mode, but it only allows simple bind values. Any undefined values

will be passed directly to DBI->execute, which may or may not be what you expect -

there will not be any explicit conversion to SQL C.

=item Hash mode

Pass a reference to a hash containing the column names & values for a single row

of data. If C was not defined on the constructor, the columns from the first

data row will be used instead. For subsequent rows any extra columns will be disregarded

and any missing columns will be considered to have an C (which can be

automatically converted to C if the C option was set).

=item Flushing the buffer

Calling C with no arguments forces a write to the db, flushing the buffer.

You don't have to call this manually as the buffer will be flushed when the object

is destroyed (e.g. your object falls out of scope).

=item Mixing modes

You can theoretically mix modes, but only when the buffer is empty e.g. you can start

with the array mode, flush the buffer and continue with hash mode (C will be

defined from the array mode). Or you can start with hash mode (so C will be defined

from the very first hash), and after flushing the buffer you can switch to array mode.

=back

=cut

sub insert {

33

6701

my $self = shift;

33

171

return $self->_hash_insert(@_) if $_[0] and ref($_[0]);

18

34

my $ret = 0;

18

50

if (@_) {

croak("Calling insert without a hash requires cols defined in constructor")

14

124

unless $self->{cols};

croak("Insert arguments must be multiple of cols")

13

20

if scalar(@_) % scalar @{$self->{cols}};

13

178

croak("Insert was previously called with hash argument (still in buffer)")

11

105

if $self->{hash_buffer};

10

26

while (@_) {

14

19

my $rows = scalar(@_) / scalar @{$self->{cols}};

14

30

14

28

my $left = $self->{buffer} - $self->{buffer_counter}; # Space left in buffer

14

34

if ($rows > $left) { # Can't fit buffer

4

11

my $max = $left * scalar @{$self->{cols}};

4

13

4

6

push @{$self->{bind}}, splice(@_,0,$max);

4

29

4

9

$self->{buffer_counter} = $self->{buffer};

} else {

10

16

push @{$self->{bind}}, splice(@_);

10

27

10

18

$self->{buffer_counter} += $rows;

}

14

45

$ret = $self->_write_full_buffer() if $self->{buffer_counter} == $self->{buffer};

}

} elsif ($self->{buffer_counter}) { # Empty the buffer

2

21

$ret = $self->_empty_buffer();

}

14

56

return $ret;

}

=head1 ATTRIBUTES

=head2 C

my $val = $sql->{last_retval}

The return value of the last DBI C is stored in this attribute. On a successful

insert it should contain the number of rows of that statement. Note that an C

call, depending on the buffering, may call C zero, one or more times.

=head2 C

my $total = $sql->{row_total}

Basically a running total of the return values, for successful inserts it shows you

how many rows were inserted into the database. It will be undef if no C has

been called.

=head2 C

my $count = $sql->{buffer_counter}

Check how many un-inserted data rows the buffer currently holds.

=head1 FUNCTIONS

=head2 simple_insert

# Single row

my ($sql, @bind) = simple_insert($table, \%fieldvals, \%options);

# Multi-row

my ($sql, @bind) = simple_insert($table, [\%fieldvals_row1,...], \%options);

Returns the SQL statement and bind variable array for a hash containing the row

columns and values. Values are treated as bind variables unless they are references

to SQL code strings. E.g. :

my ($sql, @bind) = simple_insert('table', {foo=>"bar",when=>\"NOW()"});

### INSERT INTO table (foo, when) VALUES (?,NOW())

The function also accepts an array of hashes to allow multi-row inserts:

my ($sql, @bind) = simple_insert('table', [{foo=>"foo"},{foo=>"bar"}]);

### INSERT INTO table (foo) VALUES (?),(?)

The first row (element in array) needs to contain the superset of all the columns

that you want to insert, if some of your rows have undefined column data.

Options:

=over 4

=item * C : If true, any undefined values will be converted to SQL's

C (similar to the C default). The default behaviour will leave

an undef as the bind variable, which may either create an empty string in the db or

give an error depending on your column type and db settings.

=item * C : For MySQL, define as C<'ignore'> or C<'update'> to get an

C or C query respectively. See L

for details on the latter.

=back

=cut

sub simple_insert {

10

13027

my $table = shift;

10

15

my $fields = shift;

10

13

my $opt = shift;

10

18

my ($placeh, @bind, @cols);

10

26

if (ref($fields) eq 'ARRAY') {

3

13

@cols = keys %{$fields->[0]};

3

17

3

4

my @rows;

3

7

foreach my $f (@$fields) {

5

12

my ($row, @b) = _row_placeholders($f, \@cols, $opt->{null_undef});

5

10

push @rows, $row;

5

21

push @bind, @b;

}

3

8

$placeh = join(",\n", @rows);

} else {

7

21

@cols = keys %$fields;

7

24

($placeh, @bind) = _row_placeholders($fields, \@cols, $opt->{null_undef});

}

return _create_insert_sql(

$table, \@cols, $placeh, $opt->{duplicates}

10

45

), @bind;

}

=head2 multi_insert_sql

my $sql = multi_insert_sql(

$table,

\@columns, # names of table columns

$num_of_rows?, # default = 1

$duplicates? # can be set as ignore/update in case of duplicate key (MySQL)

);

Builds bulk insert query (single insert is possible too), with ability for

ignore/on duplicate key update variants for MySQL.

Requires at least the name of the table C<$table> and an arrayref with the column

names C<\@columns>. See L if you want to quote table or column names.

Optional parameters:

=over 4

=item * C<$num_of_rows> : By default it returns SQL with bind value placeholders

for a single row. You can define any number of rows to use with multi-row bind

variable arrays.

=item * C<$duplicate> : For MySQL, passing C<'ignore'> as the 4th argument returns

an C query. Passing C<'update'> as the argument returns a query

containing an `ON DUPLICATE KEY UPDATE` clause (see L for further details).

=back

=cut

sub multi_insert_sql {

18

6214

my $table = shift;

18

32

my $columns = shift;

18

59

my $num_rows = shift || 1;

18

25

my $dupe = shift;

18

105

return unless $table && $columns && @$columns;

15

95

my $placeholders =

join(",\n", ('(' . join(',', ('?') x @$columns) . ')') x $num_rows);

15

39

return _create_insert_sql($table, $columns, $placeholders, $dupe);

}

## Private methods

sub _hash_insert {

15

26

my $self = shift;

15

20

my $fields = shift;

15

21

my $ret = 0;

croak("Insert was previously called with an array argument (still in buffer)")

15

151

if $self->{buffer_counter} && !$self->{hash_buffer};

14

22

$self->{buffer_counter}++;

14

32

$self->{cols} = [keys %$fields] if !defined($self->{cols});

14

31

my ($row, @bind) = _row_placeholders($fields, $self->{cols}, $self->{null});

14

28

push @{$self->{hash_buffer}}, $row;

14

31

14

19

push @{$self->{bind}}, @bind;

14

29

14

38

$ret = $self->_write_hash_buffer() if $self->{buffer_counter} == $self->{buffer};

14

52

return $ret;

}

sub _write_full_buffer {

9

18

my $self = shift;

$self->{full_buffer_insert} = $self->_prepare_full_buffer_insert()

9

41

if !$self->{full_buffer_insert};

9

50

$self->_execute($self->{full_buffer_insert});

9

19

$self->_cleanup();

9

21

return $self->{last_retval};

}

sub _prepare_full_buffer_insert {

6

9

my $self = shift;

$self->{full_buffer_insert} = $self->{dbh}->prepare(

6

14

multi_insert_sql(map {$self->{$_}} qw/table cols buffer dupes/)

24

50

);

}

sub _empty_buffer {

5

11

my $self = shift;

5

24

return $self->_write_hash_buffer() if $self->{hash_buffer};

3

15

my $rows = scalar(@{$self->{bind}}) / scalar @{$self->{cols}};

3

8

3

9

my $sth = $self->{dbh}->prepare(

multi_insert_sql(

$self->{table},

$self->{cols},

$rows,

$self->{dupes}

)

3

15

);

3

29

$self->_execute($sth);

3

11

$self->_cleanup();

3

8

return $self->{last_retval};

}

sub _write_hash_buffer {

9

16

my $self = shift;

9

14

my $placeh = join(",\n", @{$self->{hash_buffer}});

9

24

my $sth = $self->{dbh}->prepare(

_create_insert_sql(

$self->{table}, $self->{cols}, $placeh, $self->{dupes}

)

9

25

);

9

54

$self->_execute($sth);

9

24

$self->_cleanup();

9

22

return $self->{last_retval};

}

sub _execute {

21

78

my $self = shift;

21

29

my $sth = shift;

21

51

$self->{row_total} = 0 if !defined($self->{row_total});

21

32

$self->{last_retval} = $sth->execute(@{$self->{bind}});

21

52

21

142

$self->{row_total} += $self->{last_retval} if $self->{last_retval};

}

sub _cleanup {

32

50

my $self = shift;

32

56

$self->{bind} = undef;

32

67

$self->{hash_buffer} = undef;

32

57

$self->{buffer_counter} = 0;

}

sub DESTROY {

13

18902

my $self = shift;

# Empty buffer

13

78

$self->_empty_buffer() if $self->{buffer_counter};

}

## Private functions

sub _create_insert_sql {

38

3070

my $table = shift;

38

55

my $columns = shift;

38

79

my $placeh = shift;

38

121

my $dupe = shift || "";

38

86

my $ignore = ($dupe eq 'ignore') ? ' IGNORE' : '';

38

79

my $cols = join(',', @$columns);

38

92

my $sql = "INSERT$ignore INTO $table ($cols)\nVALUES $placeh";

38

97

$sql .= _on_duplicate_key_update($columns) if $dupe eq 'update';

38

141

return $sql;

}

sub _row_placeholders {

34

12176

my $fields = shift;

34

48

my $cols = shift;

34

51

my $null = shift;

34

48

my @bind = ();

34

71

my $sql = "(";

34

49

my $val;

34

61

foreach my $key (@$cols) {

56

154

$fields->{$key} = \"NULL" if $null && !defined($fields->{$key});

56

115

if (ref($fields->{$key})) {

12

31

$val = ${$fields->{$key}};

12

26

} else {

44

65

$val = "?";

44

71

push @bind, $fields->{$key};

}

56

107

$sql .= "$val,";

}

34

93

chop($sql) if @$cols;

34

131

return "$sql)", @bind;

}

sub _on_duplicate_key_update {

8

3879

my $columns = shift;

return "\nON DUPLICATE KEY UPDATE "

8

33

. join(',', map {"$_=VALUES($_)"} @$columns);

11

52

}

=head1 NOTES

=head2 Using reserved words as object names

If you are using reserved words as table/column names (which is strongly discouraged),

just include the appropriate delimiter in the C or C parameter. E.g. for

536							MySQL with columns named C and C you can do:
537
538							cols => [qw/`from` `to`/]
539
540							For PostgreSQL or Oracle you'd do C<[qw/"from" "to"/]>, for SQL Server C<[qw/[from] [to]/]> etc.
541
542							=head2 On duplicate key update
543
544							The C 'update'> option creates an C clause
545							for the query. E.g.:
546
547							my $sql = multi_insert_sql('table_name', [qw/col1 col2/], 2, 'update');
548
549							will produce:
550
551							## INSERT INTO table_name (col1,col2) VALUES (?,?),(?,?) ON DUPLICATE KEY UPDATE col1=VALUES(col1),col2=VALUES(col2)
552
553							Note that as of MySQL 8.0.20 the C in C is deprecated (row alias is
554							used instead), so this functionality might need to be updated some day if C is
555							removed completely.
556
557							=head2 Output whitespace
558
559							No spaces are added to the output string beyond the minimum. However, there is a new
560							line (C<\n>) added for each row of value placeholders - mainly to easily count the
561							number of rows from the string.
562							Also, the C clause is on a new line.
563
564							=head2 Error handling
565
566							The module does not do any error handling on C/C statements,
567							you should use L's C and C.
568
569							=head2 Performance
570
571							The OO interface has minimal overhead. The only consideration is that if your rows
572							do not contain particularly large amounts of data, you may want to increase the buffer
573							size which is at a modest 100 rows.
574
575							Internally, to construct the prepared statements it uses similar logic to the public
576							functions. C is of particular interest as it is a minimalistic function
577							that may replace (similar interface / feature set) the C functions from
578							C or C while being over 40x faster than the former and
579							around 3x faster than the latter. The included C script gives
580							an idea (results on an M1 Pro Macbook):
581
582							Compare SQL::Abstract, SQL::Maker, simple_insert:
583							Rate Abstract Abstract cached Maker Maker cached simple_insert
584							Abstract 4207/s -- -6% -90% -91% -98%
585							Abstract cached 4482/s 7% -- -90% -90% -98%
586							Maker 44245/s 952% 887% -- -4% -76%
587							Maker cached 46205/s 998% 931% 4% -- -75%
588							simple_insert 187398/s 4355% 4081% 324% 306% --
589
590							Compare simple_insert, multi_insert_sql for single row:
591							Rate simple_insert multi_insert_sql
592							simple_insert 190037/s -- -76%
593							multi_insert_sql 797596/s 320% --
594
595							=head1 AUTHOR
596
597							Dimitrios Kechagias, C<< >>
598
599							=head1 BUGS
600
601							Please report any bugs or feature requests either on L (preferred), or on RT
602							(via the email , or L).
603
604							I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
605
606							=head1 GIT
607
608							L
609
610							=head1 CPAN
611
612							L
613
614							=head1 LICENSE AND COPYRIGHT
615
616							Copyright (C) 2023, SpareRoom
617
618							This is free software; you can redistribute it and/or modify it under
619							the same terms as the Perl 5 programming language system itself.
620
621							=cut
622
623							1; # End of SQL::Inserter