File Coverage

lib/Cheater.pm

Criterion	Covered	Total	%
statement	4	6	66.6
branch			n/a
condition			n/a
subroutine	2	2	100.0
pod			n/a
total	6	8	75.0

line	stmt	sub	time	code
1				package Cheater;
2
3	11	11	69002	use Cheater::Parser;
	11		53
	11		1400
4	11	11	6010	use Cheater::AST;
	0
	0
5				use Cheater::Eval;
6
7				our $VERSION = '0.10';
8
9				1;
10				__END__
11
12				=encoding utf-8
13
14				=head1 NAME
15
16				Cheater - Generate random database based on rules
17
18				=head1 VERSION
19
20				This document describes Cheater 0.10 released on June 24, 2011.
21
22				=head1 DESCRIPTION
23
24				Cheater is a tool that can generate random database based on rules. It's being widely used
25				within the LineZing team of Taobao.com.
26
27				Compared to other similar tools, C<cheater> has the following advantages:
28
29				=over
30
31				=item *
32
33				it can automatically handle the association and foreign key restrictions among
34				data tables, so it's the real "database instance generator".
35
36				=item *
37
38				It defines a SQL-like little language to specify the data model that we want to generate from.
39
40				=item *
41
42				It supports powerful C<{a, b, c}> discrete enumation sets, numerical/time/date interval syntax C<a..b>,
43				Perl regular expressions C</regex/>, constant values C<'string'>, C<1.32>, and etc, to describe the value
44				range of data table field.
45
46				=item *
47
48				It can generate JSON or SQL insert statements to ease importing to RDMBSes like MySQL/PostgreSQL.
49
50				=back
51
52				Below is a very simple example to demonstrate its basic usage.
53
54				First of all, we create a F<.cht> input file in our working directory (say, under F<~/work/>),
55				in order to describe the data model that we want to geneate data from. Assuming we have
56				a F<company.cht> file like this:
57
58				# Empolyee table
59				table employees (
60				id serial;
61				name text /[A-Z]a-z{2,5} [A-Z]a-z{2,7}/ not null unique;
62				age integer 18..60 not null;
63				tel text /1[35]8\d{8}/;
64				birthday date;
65				height real 1.50 .. 1.90 not null;
66				grades text {'A','B','C','D','E'} not null;
67				department references departments.id;
68				)
69
70				# Department table
71				table departments (
72				id serial;
73				name text /\w{2,10}/ not null;
74				)
75
76				10 employees;
77				2 departments;
78
79				Here we're using the little language (or DSL) defined by C<cheater> itself. It's semantics
80				is self-explanatory. In particular, the last two lines state that we want to generate 10 rows
81				of data for the C<employees> table and 2 rows for the C<departments> table.
82
83				And then, we use the F<cht-compile> command to compile our F<company.cht> file to generate a
84				random database instance:
85
86				$ cht-compile company.cht
87				Wrote ./data/departments.schema.json
88				Wrote ./data/departments.rows.json
89				Wrote ./data/employees.schema.json
90				Wrote ./data/employees.rows.json
91
92				We see that it generates two F<.json> data files for the C<departments> and C<employees> tables,
93				respectively. For example, the F<data/emplyees.rows.json> file on my machine resulting from
94				a particular run looks like this:
95
96				$ cat data/employees.rows.json
97				[["id","name","age","tel","birthday","height","grades","department"],
98				["7606","Kxhwcn Cflub",54,"15872171866","2011-04-01","1.67276","D","408862"],
99				["63649","Whf Iajgw",55,"13850771916",null,"1.65297","E","844615"],
100				["348161","Nnwe Obfkln",27,"15801601215","2011-03-06","1.69275","D","408862"],
101				["353404","Shgpak Xvqxw",28,"15816453097",null,"1.67796","A","408862"],
102				["445500","Bdt Mhepht",47,"13855517847",null,"1.89943","C","844615"],
103				["513515","Ipsa Mcbtk",25,"13874017694","2011-01-06","1.79534","A","844615"],
104				["658009","Lboe Etqo",27,null,"2011-04-14","1.85162","E","408862"],
105				["716899","Gey Elacflr",18,"15804516095","2011-02-27","1.75681","A","844615"],
106				["945911","Hsuz Qcmky",39,"13862516775","2011-05-31","1.75947","B","408862"],
107				["960643","Qbmbe Ijnbqsb",24,"15872418765","2011-04-11","1.78864","B","844615"]]
108
109				These are the "row data". On the other hand, F<./data/employees.schema.json> is the table structure
110				definition for the C<employees> table. It looks like this on my side:
111
112				[{"attrs":[],"name":"id","type":"serial"},
113				{"attrs":["not null","unique"],"name":"name","type":"text"},
114				{"attrs":["not null"],"name":"age","type":"integer"},
115				{"attrs":[],"name":"tel","type":"text"},
116				{"attrs":[],"name":"birthday","type":"date"},
117				{"attrs":["not null"],"name":"height","type":"real"},
118				{"attrs":["not null"],"name":"grades","type":"text"},
119				{"attrs":[],"name":"department","type":"serial"}]
120
121				We can generate SQL DDL statement files accepted by RDBMSes like MySQL or PostgreSQL from the
122				F<.schema.json> files like this:
123
124				$ cht-schema2sql data/employees.schema.json
125				Wrote ./sql/employees.schema.sql
126
127				The output F<.sql> file looks like this:
128
129				$ cat ./sql/employees.schema.sql
130				drop table if exists employees;
131				create table employees (
132				id serial primary key,
133				name text not null unique,
134				age integer not null,
135				tel text,
136				birthday date,
137				height real not null,
138				grades text not null,
139				department serial
140				);
141
142				If we want to eliminate the drop table statement in the resulting SQL file, we can
143				specify the C<-n> option while running the F<cht-schema2sql> utility. For instance,
144
145				$ cht-schema2sql -n data/employees.schema.json
146				Wrote ./sql/employees.schema.sql
147
148				At last, we can use the F<cht-rows2sql> command to convert those F<.rows.json> data files to
149				F<.sql> files that are ready for relation database systems to import the "row data".
150
151				$ cht-rows2sql data/*.rows.json
152				Wrote ./sql/departments.rows.sql
153				Wrote ./sql/employees.rows.sql
154
155				The F<sql/departments.rows.sql> looks like this on my side:
156
157				$ cat sql/departments.rows.sql
158				insert into departments (id,name) values
159				(408862,'dJRq7LCXL'),
160				(844615,'G_m9Nkh3q');
161
162				To prevent the resulting data from conflicting with extra unique key restrictions in the targeting
163				RDMBS table, we can use the C<-r> option to make F<cht-rows2sql> generate SQL replace statements
164				to work-around this:
165
166				$ cht-rows2sql -r data/*.rows.json
167				Wrote ./sql/departments.rows.sql
168				Wrote ./sql/employees.rows.sql
169
170				Now we're ready to import the random data into database systems like MySQL!
171
172				$ mysql -u some_user -p dbname < sql/departments.rows.sql
173
174				For now, C<cheater> is still in active development and lacking comprehensive documentation,
175				the most complete documentation is its (declarative) test suite:
176
177				L<http://github.com/agentzh/cheater/tree/master/t/>
178
179				Open one of those F<.t> files, you can see lots of declarative test cases, like these:
180
181				=== TEST 5: datetime range domain
182				--- src
183				table cats (
184				birthday datetime 2010-05-24 03:45:00..2010-06-05 18:46:05 not null;
185				)
186
187				5 cats;
188				--- out
189				cats
190				birthday
191				2010-06-02 14:59:02
192				2010-06-04 03:31:00
193				2010-06-03 01:51:41
194				2010-05-28 19:29:34
195				2010-06-02 13:31:38
196
197				=head1 INSTALLATION
198
199				perl Makefile.PL
200				make
201				make test
202				make install
203
204				SOURCE REPOSITORY
205				The source repository of this project is on GitHub:
206
207				L<http://github.com/agentzh/cheater/>
208
209				If you have found any bugs or feature request, feel free to create tickets on the GitHub issues page:
210
211				L<http://github.com/agentzh/cheater/issues>
212
213				=head1 AUTHOR
214
215				Zhang "agentzh" Yichun (ç« äº¦æ˜¥) C<< <agentzh@gmail.com> >>
216
217				=head1 COPYRIGHT & LICENSE
218
219				Copyright (c) 2010-2011, Taobao Inc., Alibaba Group (L<http://www.taobao.com>).
220
221				Copyright (c) 2010-2011, Zhang "agentzh" Yichun (ç« äº¦æ˜¥) C<< <agentzh@gmail.com> >>.
222
223				This module is licensed under the terms of the BSD license.
224
225				Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
226
227				=over
228
229				=item *
230
231				Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
232
233				=item *
234
235				Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
236
237				=item *
238
239				Neither the name of the Taobao Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
240
241				=back
242
243				THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
244
245				=head1 SEE ALSO
246
247				L<Parse::RandGen::Regexp>, L<Data::Random>.
248