1) Market Basket list. No header is expected, The number of columns is undefined (Default).
If header row exists then in 1st parameter of the extra parameters field the string representing the absent item must be declared. If absent item is nothing or not exists then assign "" or "nan".
The participant columns must be declared in extra parameters field starting from the 2nd parameter.
Examples
citrus fruit,semi-finished bread,margarine,ready soups
tropical fruit,yogurt,coffee
whole milk
pip fruit,yogurt,cream cheese,meat spreads
other vegetables,whole milk,condensed milk,long life bakery product
...
"nan" "Item 1" "Item 2" "Item 3" "Item 4" "Item 5" "Item 6" "Item 7" "Item 8"
Item(s),Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,Item 10,Item 11,Item 12
4,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,,,,
3,tropical fruit,yogurt,coffee,,,,,,,,,,,,,
1,whole milk,,,,,,,,,,,,
4,pip fruit,yogurt,cream cheese,meat spreads,,,,,,,,,,,,,,,
4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,,,,,,,,,,,,,,
...
2) Order/Invoice detail. Header line is mandatory in that type. Number of columns is fixed in the dataset.
In extra parameters field, 1st parameter represents the primary key column while 2nd parameter represents the items column and are both mandatory.
Example
"InvoiceNo" "StockCode"
or"InvoiceNo" "Description"
InvoiceNo;StockCode;Description;Quantity;InvoiceDate;UnitPrice;CustomerID
536365;85123A;WHITE HANGING HEART T-LIGHT HOLDER;6;1/12/2010 8:26;2,55;17850
536365;71053;WHITE METAL LANTERN;6;1/12/2010 8:26;3,39;17850
536365;84406B;CREAM CUPID HEARTS COAT HANGER;8;1/12/2010 8:26;2,75;17850
536365;84029G;KNITTED UNION FLAG HOT WATER BOTTLE;6;1/12/2010 8:26;3,39;17850
536366;22633;HAND WARMER UNION JACK;6;1/12/2010 8:28;1,85;17850
536366;22632;HAND WARMER RED POLKA DOT;6;1/12/2010 8:28;1,85;17850
536367;84879;ASSORTED COLOUR BIRD ORNAMENT;32;1/12/2010 8:34;1,69;13047
536367;22745;POPPY'S PLAYHOUSE BEDROOM ;6;1/12/2010 8:34;2,1;13047
536367;22748;POPPY'S PLAYHOUSE KITCHEN;6;1/12/2010 8:34;2,1;13047
...
3) Sparse item Dataset. Header line is mandatory in that type of dataset and the number of columns/items is fixed.
Item columns are mandatory to be declared from the 2nd parameter and above in extra parameters field.
In 1st parameter, the string of absent item must be declared!!! If absent item is nothing then assign '' or 'nan'.
Examples
"?" "d1" "d2" "diapers" "d4" "beers" "d6" "d7" "d8" "d9" "grocery" "d11" "baby needs" "bread" "coupons" "juice"
d1,d2,d3,d4,d5,d6,d7,d8,d9,grocery,d11,baby needs,bread,coupons,juice,...
?,?,?,?,?,?,?,?,?,?,?,t,t,t,?,t,...
t,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,...
?,?,?,?,?,?,?,?,?,?,?,?,t,t,?,t,...
t,?,?,?,?,?,?,?,?,?,?,?,t,t,?,t,...
?,?,?,?,?,?,?,?,?,?,?,?,t,t,?,t,...
?,?,t,?,?,?,t,?,?,?,?,?,t,t,?,t,...
t,?,?,?,?,?,?,?,?,?,?,?,t,t,?,t,...
?,?,?,?,?,?,?,?,?,?,?,t,t,t,?,t,...
...
"0" "air fresheners candles" "asian foods" "baby accessories" "baby bath" "body care"
order_id,pay_method,arrival_timestamp,air fresheners candles,asian foods,baby accessories,baby bath body care,...
930878,2,0:00:22,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...
149444,2,0:02:45,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...
327583,1,0:03:17,3,0,0,0,0,0,1,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,...
576050,2,0:03:33,3,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,...
200249,1,0:03:59,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...
608770,2,0:04:29,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,...
...
4) Columns with multiple categorized values. Header line is optional.
Number of columns is fixed, optionally, items columns are expected in extra parameters field in case that header line exists.
Example
1st,adult,male,yes
1st,adult,male,yes
1st,adult,male,yes
1st,adult,male,no
1st,adult,female,yes
1st,adult,female,yes
1st,adult,female,yes
...
"class" "age" "sex" "survived"
class,age,sex,survived
1st,adult,male,yes
1st,adult,male,yes
1st,adult,male,yes
1st,adult,male,no
1st,adult,female,yes
1st,adult,female,yes
1st,adult,female,yes
...
Public datasets are the ones that can freely be used by all the users that are registered to WebApriori application. For security reasons, the researchers who can upload public datasets must have advanced privileges. If so, they can choose if they will upload a private dataset - visible only to them and to their web service client - or a public one - visible to all users. The public ones are very useful for professors who need to share datasets with their students for learning or experimental purposes.
If you are interested in uploading a public dataset please send an email to the WebApriori Administrator.
stoug@ihu.gr