File Coverage

lib/Cheater.pm
Criterion Covered Total %
statement 4 6 66.6
branch n/a
condition n/a
subroutine 2 2 100.0
pod n/a
total 6 8 75.0


line stmt bran cond sub pod time code
1             package Cheater;
2              
3 11     11   69002 use Cheater::Parser;
  11         53  
  11         1400  
4 11     11   6010 use Cheater::AST;
  0            
  0            
5             use Cheater::Eval;
6              
7             our $VERSION = '0.10';
8              
9             1;
10             __END__
11              
12             =encoding utf-8
13              
14             =head1 NAME
15              
16             Cheater - Generate random database based on rules
17              
18             =head1 VERSION
19              
20             This document describes Cheater 0.10 released on June 24, 2011.
21              
22             =head1 DESCRIPTION
23              
24             Cheater is a tool that can generate random database based on rules. It's being widely used
25             within the LineZing team of Taobao.com.
26              
27             Compared to other similar tools, C<cheater> has the following advantages:
28              
29             =over
30              
31             =item *
32              
33             it can automatically handle the association and foreign key restrictions among
34             data tables, so it's the real "database instance generator".
35              
36             =item *
37              
38             It defines a SQL-like little language to specify the data model that we want to generate from.
39              
40             =item *
41              
42             It supports powerful C<{a, b, c}> discrete enumation sets, numerical/time/date interval syntax C<a..b>,
43             Perl regular expressions C</regex/>, constant values C<'string'>, C<1.32>, and etc, to describe the value
44             range of data table field.
45              
46             =item *
47              
48             It can generate JSON or SQL insert statements to ease importing to RDMBSes like MySQL/PostgreSQL.
49              
50             =back
51              
52             Below is a very simple example to demonstrate its basic usage.
53              
54             First of all, we create a F<.cht> input file in our working directory (say, under F<~/work/>),
55             in order to describe the data model that we want to geneate data from. Assuming we have
56             a F<company.cht> file like this:
57              
58             # Empolyee table
59             table employees (
60             id serial;
61             name text /[A-Z]a-z{2,5} [A-Z]a-z{2,7}/ not null unique;
62             age integer 18..60 not null;
63             tel text /1[35]8\d{8}/;
64             birthday date;
65             height real 1.50 .. 1.90 not null;
66             grades text {'A','B','C','D','E'} not null;
67             department references departments.id;
68             )
69              
70             # Department table
71             table departments (
72             id serial;
73             name text /\w{2,10}/ not null;
74             )
75              
76             10 employees;
77             2 departments;
78              
79             Here we're using the little language (or DSL) defined by C<cheater> itself. It's semantics
80             is self-explanatory. In particular, the last two lines state that we want to generate 10 rows
81             of data for the C<employees> table and 2 rows for the C<departments> table.
82              
83             And then, we use the F<cht-compile> command to compile our F<company.cht> file to generate a
84             random database instance:
85              
86             $ cht-compile company.cht
87             Wrote ./data/departments.schema.json
88             Wrote ./data/departments.rows.json
89             Wrote ./data/employees.schema.json
90             Wrote ./data/employees.rows.json
91              
92             We see that it generates two F<.json> data files for the C<departments> and C<employees> tables,
93             respectively. For example, the F<data/emplyees.rows.json> file on my machine resulting from
94             a particular run looks like this:
95              
96             $ cat data/employees.rows.json
97             [["id","name","age","tel","birthday","height","grades","department"],
98             ["7606","Kxhwcn Cflub",54,"15872171866","2011-04-01","1.67276","D","408862"],
99             ["63649","Whf Iajgw",55,"13850771916",null,"1.65297","E","844615"],
100             ["348161","Nnwe Obfkln",27,"15801601215","2011-03-06","1.69275","D","408862"],
101             ["353404","Shgpak Xvqxw",28,"15816453097",null,"1.67796","A","408862"],
102             ["445500","Bdt Mhepht",47,"13855517847",null,"1.89943","C","844615"],
103             ["513515","Ipsa Mcbtk",25,"13874017694","2011-01-06","1.79534","A","844615"],
104             ["658009","Lboe Etqo",27,null,"2011-04-14","1.85162","E","408862"],
105             ["716899","Gey Elacflr",18,"15804516095","2011-02-27","1.75681","A","844615"],
106             ["945911","Hsuz Qcmky",39,"13862516775","2011-05-31","1.75947","B","408862"],
107             ["960643","Qbmbe Ijnbqsb",24,"15872418765","2011-04-11","1.78864","B","844615"]]
108              
109             These are the "row data". On the other hand, F<./data/employees.schema.json> is the table structure
110             definition for the C<employees> table. It looks like this on my side:
111              
112             [{"attrs":[],"name":"id","type":"serial"},
113             {"attrs":["not null","unique"],"name":"name","type":"text"},
114             {"attrs":["not null"],"name":"age","type":"integer"},
115             {"attrs":[],"name":"tel","type":"text"},
116             {"attrs":[],"name":"birthday","type":"date"},
117             {"attrs":["not null"],"name":"height","type":"real"},
118             {"attrs":["not null"],"name":"grades","type":"text"},
119             {"attrs":[],"name":"department","type":"serial"}]
120              
121             We can generate SQL DDL statement files accepted by RDBMSes like MySQL or PostgreSQL from the
122             F<.schema.json> files like this:
123              
124             $ cht-schema2sql data/employees.schema.json
125             Wrote ./sql/employees.schema.sql
126              
127             The output F<.sql> file looks like this:
128              
129             $ cat ./sql/employees.schema.sql
130             drop table if exists employees;
131             create table employees (
132             id serial primary key,
133             name text not null unique,
134             age integer not null,
135             tel text,
136             birthday date,
137             height real not null,
138             grades text not null,
139             department serial
140             );
141              
142             If we want to eliminate the drop table statement in the resulting SQL file, we can
143             specify the C<-n> option while running the F<cht-schema2sql> utility. For instance,
144              
145             $ cht-schema2sql -n data/employees.schema.json
146             Wrote ./sql/employees.schema.sql
147              
148             At last, we can use the F<cht-rows2sql> command to convert those F<.rows.json> data files to
149             F<.sql> files that are ready for relation database systems to import the "row data".
150              
151             $ cht-rows2sql data/*.rows.json
152             Wrote ./sql/departments.rows.sql
153             Wrote ./sql/employees.rows.sql
154              
155             The F<sql/departments.rows.sql> looks like this on my side:
156              
157             $ cat sql/departments.rows.sql
158             insert into departments (id,name) values
159             (408862,'dJRq7LCXL'),
160             (844615,'G_m9Nkh3q');
161              
162             To prevent the resulting data from conflicting with extra unique key restrictions in the targeting
163             RDMBS table, we can use the C<-r> option to make F<cht-rows2sql> generate SQL replace statements
164             to work-around this:
165              
166             $ cht-rows2sql -r data/*.rows.json
167             Wrote ./sql/departments.rows.sql
168             Wrote ./sql/employees.rows.sql
169              
170             Now we're ready to import the random data into database systems like MySQL!
171              
172             $ mysql -u some_user -p dbname < sql/departments.rows.sql
173              
174             For now, C<cheater> is still in active development and lacking comprehensive documentation,
175             the most complete documentation is its (declarative) test suite:
176              
177             L<http://github.com/agentzh/cheater/tree/master/t/>
178              
179             Open one of those F<.t> files, you can see lots of declarative test cases, like these:
180              
181             === TEST 5: datetime range domain
182             --- src
183             table cats (
184             birthday datetime 2010-05-24 03:45:00..2010-06-05 18:46:05 not null;
185             )
186              
187             5 cats;
188             --- out
189             cats
190             birthday
191             2010-06-02 14:59:02
192             2010-06-04 03:31:00
193             2010-06-03 01:51:41
194             2010-05-28 19:29:34
195             2010-06-02 13:31:38
196              
197             =head1 INSTALLATION
198              
199             perl Makefile.PL
200             make
201             make test
202             make install
203              
204             SOURCE REPOSITORY
205             The source repository of this project is on GitHub:
206              
207             L<http://github.com/agentzh/cheater/>
208              
209             If you have found any bugs or feature request, feel free to create tickets on the GitHub issues page:
210              
211             L<http://github.com/agentzh/cheater/issues>
212              
213             =head1 AUTHOR
214              
215             Zhang "agentzh" Yichun (章亦春) C<< <agentzh@gmail.com> >>
216              
217             =head1 COPYRIGHT & LICENSE
218              
219             Copyright (c) 2010-2011, Taobao Inc., Alibaba Group (L<http://www.taobao.com>).
220              
221             Copyright (c) 2010-2011, Zhang "agentzh" Yichun (章亦春) C<< <agentzh@gmail.com> >>.
222              
223             This module is licensed under the terms of the BSD license.
224              
225             Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
226              
227             =over
228              
229             =item *
230              
231             Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
232              
233             =item *
234              
235             Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
236              
237             =item *
238              
239             Neither the name of the Taobao Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
240              
241             =back
242              
243             THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
244              
245             =head1 SEE ALSO
246              
247             L<Parse::RandGen::Regexp>, L<Data::Random>.
248