Python Programming for Data Science

Python Exercises
List Comprehension Exercises
Pandas Exercises
Rule-Based Classification I (Persona)
Rule-Based Classification II (Gezinomi)

Python Exercises

One Example from the Exercises:

Task 7:

3 lists are given below. In the lists, there is a lesson code, credit and quota information, respectively. Print course information using zip().

ders_kodu = ['CMP1005', 'PSY1001', 'HUK1005', 'SEN2204']
kredi = [3, 4, 2, 4]
kontenjan = [30, 75, 150, 25]

Expected Output:

Kredisi 3 olan CMP1005 kodlu dersin kontenjanı 30 kişidir.
Kredisi 4 olan PSY1001 kodlu dersin kontenjanı 75 kişidir.
Kredisi 2 olan HUK1005 kodlu dersin kontenjanı 150 kişidir.
Kredisi 4 olan SEN2204 kodlu dersin kontenjanı 25 kişidir.

📝 Notes

'ders_kodu' refers to 'lecture code'
'kredi' refers to 'credit'
'kontenjan' refers to 'quota'

List Comprehension Exercises

Task 1:

Using the list comprehension structure, capitalize the names of the numeric variables in the 'car_crashes' data and add 'NUM' to the beginning.

import seaborn as sns
df = sns.load_dataset('car_crashes')
df.columns

Expected Output:

['NUM_TOTAL',
 'NUM_SPEEDING',
 'NUM_ALCOHOL',
 'NUM_NOT_DISTRACTED',
 'NUM_NO_PREVIOUS',
 'NUM_INS_PREMIUM',
 'NUM_INS_LOSSES',
 'ABBREV']

📝 Notes

All variable names must be uppercase.
Must be done with a single list comprehension structure.

Task 2:

Using the list comprehension structure, write 'FLAG' at the end of the variables that do not contain 'no' in their names in the 'car_crashes' data.

Expected Output:

['TOTAL_FLAG',
'SPEEDING_FLAG',
'ALCOHOL_FLAG',
'NOT_DISTRACTED',
'NO_PREVIOUS',
'INS_PREMIUM_FLAG',
'INS_LOSSES_FLAG',
'ABBREV_FLAG']

📝 Notes

All variable names must be uppercase.
Must be done with a single list comprehension structure.

Task 3:

Using the list comprehension structure, select the names of the variables that are DIFFERENT from the variable names given below and create a new dataframe.

og_list = ['abbrev', 'no_previous']

Expected Output:

	total	speeding	alcohol	not_distracted	ins_premium	ins_losses
0	18.800	7.332	5.640	18.048	784.550	145.080
1	18.100	7.421	4.525	16.290	1053.480	133.930
2	18.600	6.510	5.208	15.624	899.470	110.350
3	22.400	4.032	5.824	21.056	827.340	142.390
4	12.000	4.200	3.360	10.920	878.410	165.630

📝 Note

First, create a new list named new_cols using list comprehension according to the list above. Then create a new df by selecting these variables with df[new_cols] and name it new_df.

Requirements

pandas=1.5.1
seaborn=0.12.1

Pandas Exercises

Some Examples from the Exercises:

Task 1:

Identify the 'titanic' dataset from the Seaborn library.

Task 2:

Find the number of male and 'female' passengers in the 'titanic' dataset defined above.

Task 3:

Find the number of unique values for each column.

Requirements

numpy==1.24.3
pandas==1.5.1
seaborn==0.12.1

Calculating Lead-Based Returns with Rule-Based Classification using Persona's Dataset

Business Problem
A game company wants to create level-based new customer definitions (personas) by using some features of its customers, and to create segments according to these new customer definitions and to estimate how much the new customers can earn on average according to these segments.

For Example: It is desired to determine how much a 25-year-old male user from Türkiye who is an IOS user can earn on average.

Dataset Story
The persona.csv dataset contains the prices of the products sold by an international game company and some demographic information of the users who buy these products. The dataset consists of records created in each sales transaction. This means that the table is not deduplicated. In other words, a user with certain demographic characteristics may have made more than one purchase.

Variables

PRICE: Customer spending amount
SOURCE: The type of device the customer is connecting with
SEX: Gender of the customer
COUNTRY: Country of the customer
AGE: Age of the customer

Before Application Dataset:

	PRICE	SOURCE	SEX	COUNTRY	AGE
0	39	android	male	bra	17
1	39	android	male	bra	17
2	49	android	male	bra	17
3	29	android	male	tur	17
4	49	android	male	tur	17

Expected Output:

	customers_level_based	PRICE	SEGMENT
0	BRA_ANDROID_FEMALE_0_18	35.6453	B
1	BRA_ANDROID_FEMALE_19_23	34.0773	C
2	BRA_ANDROID_FEMALE_24_30	33.8639	C
3	BRA_ANDROID_FEMALE_31_40	34.8983	B
4	BRA_ANDROID_FEMALE_41_66	36.7371	A

Requirements

pandas==1.5.1

Calculating Lead-Based Returns with Rule-Based Classification using Gezinomi's Dataset

Business Problem
Gezinomi wants to create new level-based sales definitions by using some of the features of the sales it makes, and to create segments according to these new sales definitions and to estimate how much the new customers can earn on average according to these segments.

For Example: It is desired to determine how much a customer who wants to go to an All Inclusive hotel from Antalya during a busy period can earn on average.

Dataset Story
gezinomi.xlsx dataset contains the prices of the sales made by Gezinomi company and information about these sales. The dataset consists of records created in each sales transaction. This means that the table is not deduplicated. In other words, the customer may have made more than one purchase.

Variables

SaleId: Sales id
SaleDate: Sale Date
CheckInDate: Customer's check-in date
Price: Price paid for sale
ConceptName: Hotel concept information
SaleCityName: The city where the hotel is located
CInDay: Customer's check-in day
SaleCheckInDayDiff: Check in and check in date difference
Season: Season information on the check-in date

Before Application Dataset:

	SaleId	SaleDate	CheckInDate	Price	ConceptName	SaleCityName	CInDay	SaleCheckInDayDiff	Seasons
0	415122	12/3/2022	12/3/2022	79.30403	Herşey Dahil	Antalya	Saturday	0	Low
1	415103	12/3/2022	12/3/2022	45.9707	Yarım Pansiyon	Antalya	Saturday	0	Low
2	404034	9/12/2022	9/13/2022	77.83883	Herşey Dahil	Antalya	Tuesday	1	High
3	415094	12/3/2022	12/10/2022	222.7106	Yarım Pansiyon	İzmir	Saturday	7	Low
4	414951	12/1/2022	12/3/2022	140.4762	Yarım Pansiyon	İzmir	Saturday	2	Low

Expected Output:

	sales_level_based	Price	SEGMENT
0	GIRNE_HERŞEY DAHIL_HIGH	103.9354	A
1	GIRNE_HERŞEY DAHIL_LOW	90.93594	A
2	İZMIR_YARIM PANSIYON_HIGH	87.6573	A
3	DIĞER_HERŞEY DAHIL_LOW	87.31088	A
4	DIĞER_HERŞEY DAHIL_HIGH	83.78727	A

Requirements

pandas==1.5.1