Python Programming for Data Science


Python Exercises

GitHub Repository:


One Example from the Exercises:

Task 7:

3 lists are given below. In the lists, there is a lesson code, credit and quota information, respectively. Print course information using zip().

ders_kodu = ['CMP1005', 'PSY1001', 'HUK1005', 'SEN2204']
kredi = [3, 4, 2, 4]
kontenjan = [30, 75, 150, 25]

📝 Notes

  • 'ders_kodu' refers to 'lecture code'
  • 'kredi' refers to 'credit'
  • 'kontenjan' refers to 'quota'

List Comprehension Exercises

GitHub Repository:


Task 1:

Using the list comprehension structure, capitalize the names of the numeric variables in the 'car_crashes' data and add 'NUM' to the beginning.

import seaborn as sns
df = sns.load_dataset('car_crashes')
df.columns    

Expected Output:

['NUM_TOTAL',
'NUM_SPEEDING',
'NUM_ALCOHOL',
'NUM_NOT_DISTRACTED',
'NUM_NO_PREVIOUS',
'NUM_INS_PREMIUM',
'NUM_INS_LOSSES',
'ABBREV']

📝 Notes

  • All variable names must be uppercase.
  • Must be done with a single list comprehension structure.

Task 2:

Using the list comprehension structure, write 'FLAG' at the end of the variables that do not contain 'no' in their names in the 'car_crashes' data.

Expected Output:

['TOTAL_FLAG',
'SPEEDING_FLAG',
'ALCOHOL_FLAG',
'NOT_DISTRACTED',
'NO_PREVIOUS',
'INS_PREMIUM_FLAG',
'INS_LOSSES_FLAG',
'ABBREV_FLAG']

📝 Notes

  • All variable names must be uppercase.
  • Must be done with a single list comprehension structure.

Task 3:

Using the list comprehension structure, select the names of the variables that are DIFFERENT from the variable names given below and create a new dataframe.

og_list = ['abbrev', 'no_previous']

Expected Output:

  total speeding alcohol not_distracted ins_premium ins_losses
0 18.800 7.332 5.640 18.048 784.550 145.080
1 18.100 7.421 4.525 16.290 1053.480 133.930
2 18.600 6.510 5.208 15.624 899.470 110.350
3 22.400 4.032 5.824 21.056 827.340 142.390
4 12.000 4.200 3.360 10.920 878.410 165.630

📝 Note

  • First, create a new list named new_cols using list comprehension according to the list above. Then create a new df by selecting these variables with df[new_cols] and name it new_df.

Requirements

  • pandas=1.5.1
  • seaborn=0.12.1

Pandas Exercises

GitHub Repository:


Some Examples from the Exercises:


Task 1:

Identify the 'titanic' dataset from the Seaborn library.

Task 2:

Find the number of male and 'female' passengers in the 'titanic' dataset defined above.

Task 3:

Find the number of unique values ​​for each column.

Requirements

  • numpy==1.24.3
  • pandas==1.5.1
  • seaborn==0.12.1

Calculating Lead-Based Returns with Rule-Based Classification using Persona's Dataset

GitHub Repository:


Business Problem

A game company wants to create level-based new customer definitions (personas) by using some features of its customers, and to create segments according to these new customer definitions and to estimate how much the new customers can earn on average according to these segments.

For Example: It is desired to determine how much a 25-year-old male user from Türkiye who is an IOS user can earn on average.

Dataset Story

The persona.csv dataset contains the prices of the products sold by an international game company and some demographic information of the users who buy these products. The dataset consists of records created in each sales transaction. This means that the table is not deduplicated. In other words, a user with certain demographic characteristics may have made more than one purchase.

Variables
  • PRICE: Customer spending amount
  • SOURCE: The type of device the customer is connecting with
  • SEX: Gender of the customer
  • COUNTRY: Country of the customer
  • AGE: Age of the customer

Before Application Dataset:

  PRICE SOURCE SEX COUNTRY AGE
0 39 android male bra 17
1 39 android male bra 17
2 49 android male bra 17
3 29 android male tur 17
4 49 android male tur 17

Expected Output:

  customers_level_based PRICE SEGMENT
0 BRA_ANDROID_FEMALE_0_18 35.6453 B
1 BRA_ANDROID_FEMALE_19_23 34.0773 C
2 BRA_ANDROID_FEMALE_24_30 33.8639 C
3 BRA_ANDROID_FEMALE_31_40 34.8983 B
4 BRA_ANDROID_FEMALE_41_66 36.7371 A
Requirements
  • pandas==1.5.1

Calculating Lead-Based Returns with Rule-Based Classification using Gezinomi's Dataset

GitHub Repository:


Business Problem

Gezinomi wants to create new level-based sales definitions by using some of the features of the sales it makes, and to create segments according to these new sales definitions and to estimate how much the new customers can earn on average according to these segments.

For Example: It is desired to determine how much a customer who wants to go to an All Inclusive hotel from Antalya during a busy period can earn on average.

Dataset Story

gezinomi.xlsx dataset contains the prices of the sales made by Gezinomi company and information about these sales. The dataset consists of records created in each sales transaction. This means that the table is not deduplicated. In other words, the customer may have made more than one purchase.

Variables
  • SaleId: Sales id
  • SaleDate: Sale Date
  • CheckInDate: Customer's check-in date
  • Price: Price paid for sale
  • ConceptName: Hotel concept information
  • SaleCityName: The city where the hotel is located
  • CInDay: Customer's check-in day
  • SaleCheckInDayDiff: Check in and check in date difference
  • Season: Season information on the check-in date

Before Application Dataset:

  SaleId SaleDate CheckInDate Price ConceptName SaleCityName CInDay SaleCheckInDayDiff Seasons
0 415122 12/3/2022 12/3/2022 79.30403 Herşey Dahil Antalya Saturday 0 Low
1 415103 12/3/2022 12/3/2022 45.9707 Yarım Pansiyon Antalya Saturday 0 Low
2 404034 9/12/2022 9/13/2022 77.83883 Herşey Dahil Antalya Tuesday 1 High
3 415094 12/3/2022 12/10/2022 222.7106 Yarım Pansiyon İzmir Saturday 7 Low
4 414951 12/1/2022 12/3/2022 140.4762 Yarım Pansiyon İzmir Saturday 2 Low

Expected Output:

  sales_level_based Price SEGMENT
0 GIRNE_HERŞEY DAHIL_HIGH 103.9354 A
1 GIRNE_HERŞEY DAHIL_LOW 90.93594 A
2 İZMIR_YARIM PANSIYON_HIGH 87.6573 A
3 DIĞER_HERŞEY DAHIL_LOW 87.31088 A
4 DIĞER_HERŞEY DAHIL_HIGH 83.78727 A
Requirements
  • pandas==1.5.1