Basics of Python and the integration with Data Science (With Pandas Library and SQL)
- Karl Chong
- Feb 25, 2022
- 11 min read
Introduction:
Data are individual statistical information while Data science is the analysis on data. It is an important field in Modern Society as data is a precious asset of any organization. Good analysis in data helps firms to understand and improve their process.
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. It is easy to use and provide high productivity. It also supports modules and packages to provide vast possibility for different usage. For example, one of the Python’s library, Pandas, facilitates data analysis. User can use SQL to fetch data and further manipulate the structured data using Python. Python becomes more and more popular in data science field because Python provide great functionality to deal with mathematics, statistics, and scientific function. It provides great libraries to deals with data science application.
My goal is to go through the basics of Python and its integration with Data Science. First, I am going to compare Python with Java and talk about basic Python syntax. Second, what is library and how to set up Python environment with Pandas. Third, how to perform basics data enquires and manipulation using Pandas.
All the demo code can be found at: https://github.com/ckskarl/PythonDemo
Basics of Python:
Before we start...
Python Version: The most common Python version nowadays is Python 3, which has a simpler syntax structure and easier to understand compare with Python 2.
Download and install latest version of Python 3: https://www.python.org/downloads/

IDE suggestion: PyCharm (https://www.jetbrains.com/pycharm/)
PyCharm is a free IDE providing a wide range of tools for Python developers, it also has large community to provide you with great support.


Place to test your coding skill: LeetCode (https://leetcode.com/problemset/all/?difficulty=EASY&page=1)
LeetCode is an online platforms for you to practice the programming skills by solving coding questions.

Key difference between Java and Python
1. Compile + Interpret at the same time for Python
Python source code with “.py” as extension while Java source code with “.java” as extension. In python, the compilation happens line by line while it is full code at once in Java. In Python, as soon as compilation step encounters first error, it will be reported, and compilation will fail. Once the code is compiled, PVM (Python Virtual Machine) executes the bytecode. Since bytecode is directly executed, it is comparatively slow than Java.
2. Indentation is more than styling
Java uses curly braces to define the block of each function and class definition, whereas Python uses indentation to separate code into separate blocks. Indentation is important when it comes to defining functions and classes.
It is important to make sure all the indentation works as intended! Usually, the IDE will automatically indent for you, but you can also manually indent the line/lines by “tab” (or 4 white space) or back indent by “shift+tab”. Also, remember there is no “;” at the end of line in Python. Indeed, you can still use semicolon in python to denote separation of line rather than termination of line.
3. There is no “main” function in Python
Python interpreter runs the code right from the first line and goes line by line. There is no dedicated main() function in Python. When the code is passed to the interpreter, the code that is at level 0 indentation is to be executed. It will define a few special variables before execution. “__name__” is one such special variable. If the source file is executed as the main program, the interpreter sets the “__name__” variable to have a value “__main__”. If this file is being imported from another module, “__name__” will be set to the module’s name.
“__name__” is a built-in variable which evaluates to the name of the current module.
p.s. In Python, Modules are simply files with the “.py” extension containing Python code that can be imported inside another Python Program.
Basics of Python (with video at the end of the section going through the demo code):
1. Python Comments
Comment starts with ‘#’ in Python while it starts with ‘//’ in Java. It works the same as the Java. Once it reads ‘#’ Python will ignore the rest of the line.
There is no build-in multi line comments in Python. The user can either add ‘#’ for each line or make user of triple-quotes(“””) string to act as multiline comment, as long as the string is not assigned to a variable.
demo code 1:comment
# this is line comment
"""this is a multiline string,
as long as it is not assigned
to a variable it serves as multiline comment"""2. Variable declaration and Data Types
Python sets data type of its variable based on the value that is assigned to it.
Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when you assign a value to a variable.
Unlike Java, Python will change the variable type if the variable value is set to another value.
Python has seven standard Data Types:
Data types | Definition and Common example |
Numbers | there are 4 specific number type in Python, in this article I will only list out the two most common types: int and float int: signed integer e.g. x = 10 float: (.) Floating point real values e.g. x = 45.67 |
String | Python uses single quotes ' or double quotes " to denote literal strings (‘str’ type). Python also use the triple quoted strings """ to continue across the end of line statement. p.s. Python does not have a character(char) type |
Boolean | The boolean value can be of two types either True or False. (‘bool’ type) |
List | List variables are declared by using brackets [] following the variable name. A List can contain different variable types. |
Set | Sets are used to store multiple items in a single variable using curly bracket {}. Noted a set will ignore duplicate items. A set can also contain different data types. A set is a collection which is unordered, unchangeable, and unindexed. |
Tuple | Tuple are a group of value like a list BUT tuples are fixed in size once they are assigned. Tuple items are ordered, unchangeable, and allow duplicate values. Tuples are defined by parenthesis (). |
Dictionary | Dictionaries are used to store data values in key: value pairs. Syntax = { ‘Key1’ : value1, ‘Key2’ : value2} |
demo code 2:data type
# int type
x = 10
print('x = 10', type(x))
# reassign x to a float value, no extra cast is needed
x = 45.67
print('x = 45.67', type(x))
# str type
x = "string"
print('x = \'string\'', type(x))
# there is no char type in python
x = 'c'
print('x = \'c\'', type(x))
# bool type
x = True # True or False
print('x = True', type(x))
# a list can contain different variable type
x = ['item1', 100, False, 100]
print('\nx = [\'item1\', 100, False]', type(x))
print('actual content of x:', x)
# a set can contain different variable type, but NO duplicate value, also the order is no conserved
x = {'item1', 100, False, 100}
print('\nx = {\'item1\', 100, False, 100}', type(x))
print('actual content of x:', x)
# a tuple can contain different variable type, allowing duplicate value, fixed
x = ('item1', 100, False, 100)
print('\nx = (\'item1\', 100, False, 100)', type(x))
print('actual content of x:', x)
# Dictionaries are used to store data values in key: value pairs.
x = {'First Person': ['Karl', 30], 'Second Person': ['Viola', 29]}
print('\nx = {\'First Person\': [\'Karl\', 30], \'Second Person\': [\'Viola\', 29]}', type(x))
print('actual content of x:', x)output:

3. Useful built-in functions
Function | Description |
print() | Prints to the standard output device |
len() | Returns the length of an object |
upper() | Returns a copy of the string which all case-based characters have been uppercased. |
lower() | Returns a copy of the string which all case-based characters have been lowercased. |
type() | Returns the type of the variable |
int() | Returns an integer number (round down) |
float() | Returns a floating point number |
bool() | Returns a boolean value of the specified object |
range(start,stop,step) | Create a sequence of numbers from start to stop, and print each item in the sequence. |
input() | Allowing user input |
demo code 3: Useful functions:
print('Python is fun')
x = 'abC!'
print(len(x))
print(x.upper())
print(x.lower())
print(type(x))
print(int(89.64))
print(float(721))
# the bool function returns False when the object is empty, False, 0 or None, otherwise it returns True
s = []
print(bool(s))
s = [1, 2]
print(bool(s))
for i in range(5):
print(i, end=' ')
print()
for i in range(2, 5):
print(i, end=' ')
print()
for i in range(2, 10, 3): # 2 5 8
print(i, end=' ')
print()
x = input('Enter your Name:')
print('Hello,' + x)
output:

4. Slice notation[::]
Python's slice notation is used to return a list or a portion of a list base on particular rules, we can choose the start index, end index and the step between
demo code 4: slice notation
string = 'abcdefg'
# slice notation: [start_at:stop_before:step]
# step is optional
# For start_at and stop_before,
# a negative value means counting from the end of the list instead of counting from the start (-1 is the last index)
# A negative step means that the list is sliced in reverse!
print(string) # print the original string
print(string[1:]) # start at index 1:b
print(string[:3]) # end before index 3:d
print(string[:-3]) # end before last 3 index:e
print(string[1:5:2]) # start at index 1:b, end before index5:f, step:2
print(string[::-1]) # reverse the whole string
output:

5. Operators
Most of the operators are same as java syntax.
i. Arithmetic operators in Python that is different from Java:
Operator | Meaning |
// | Floor division – the division that results into whole number that is rounded down e.g. 5//2 return 2 |
** | Exponent - left operand raised to the power of right e.g. 5**2 return 25 |
ii. Assignment operators:
Same pattern as Java. x = 5 is a simple assignment operator that assigns the value 5 on the right to the variable x on the left.
You can also use compound operators in Python like x **= 5 that adds to the variable and later assigns the same. It is equivalent to x = x ** 5.
iii. Comparison operators:
Same as Java.
iv. Logical operators
Operator | Meaning |
and (same as && in Java) | return True if both the operands are true e.g. True and Falsereturns False |
or (same as || in Java) | return True if either of the operands is true e.g. True or False returns True |
not (same as ! in Java) | return True if operand is false (complements the operand) e.g. not True returns False |
demo code 5: Operators
x = 5
y = 2
print('x = 5, y = 2')
# Output: x + y = 7
print('x + y =', x+y)
# Output: x - y = 3
print('x - y =', x-y)
# Output: x * y = 10
print('x * y =', x*y)
# Output: x / y = 2.5
print('x / y =', x/y)
# Output: x // y = 2 5/2=2.5
print('x // y =', x//y)
# Output: x ** y = 25 x^y 5^2
print('x ** y =', x**y)
# x = 5 , y = 2
print(x > y)
print(x != y)
print(x > y and x == 3) # &&
print(x > y or x == 3) # ||
output:

6. If statements
at the end of if, else and elif, there is a colon : to indicate the end of line
Notice that there is no token that denotes the end of the block. Rather, the end of the block is indicated by a line that is indented less than the lines of the block itself.
# syntax of if (proper indentation is important!)
# basic if:
if <condition>: # if condition is true run both statement1&2
<statement1>
<statement2>
<statement3>
# due to same indentation as if, statement3 will run regardless of the if
# i.e. statement 3 is OUTSIDE of the if statement!
# if else:
if <condition>:
<statement1> # if condition is true run statement1 skip statement2
else:
<statement2> # if condition is false run statement2 skip statement1
# elif is simply the short form of else if
# noted the elif need to be the same indentation as the if statment
if <condition1>:
<statement1> # if condition1 is true run statement1
elif <condition2>:
<statement2> # if condition1 is false, check condition2
demo code 6: If
lucky_number = 43
user_input = input('Enter a number between 0 to 100:')
user_input = int(user_input)
if user_input == lucky_number:
print("Awesome! Your number matches the lucky number!")
elif user_input < lucky_number: # else if
print("Your number is smaller than the lucky number.")
else:
print("Your number is greater than the lucky number.")
output:



7. Basic for-loops
at the end of for, there is a colon : to indicate the end of line
# syntax of loop
for <variable> in <condition>:
<statement>snippet from demo code 3:
for i in range(5):
print(i, end=' ')
print()
for i in range(2, 5):
print(i, end=' ')
print()
for i in range(2, 10, 3): # 2 5 8
print(i, end=' ')
print()output:

8. Functions
Functions in Python works as methods in Java. The goal is to make the code reusable and make the program into smaller sections.
Python make use of indentation to define a function:
# syntax of function:
def function_name(parameters):
<statement>
return <statement> # return is optional
Rules of creating function:
Keyword def to indicate the start of the function header
A function name to uniquely identify the function.
Parameters (arguments) through which we pass values to a function. They are optional and NO datatype is needed to be declare here.
A colon (:) to mark the end of the function header
One or more valid python statements that make up the function body. Statements must have the same indentation level (usually 4 spaces).
An optional return statement to return a value from the function.
demo code 7: function
sample = [1, 12, 34, 3, 16, 99, 13]
def find_greatest(numlist):
temp = numlist[0]
for i in range(1, len(numlist)):
if numlist[i] > temp:
temp = numlist[i]
return temp
print(find_greatest(sample))
output:

9. Classes and Objects
Python is also an object-oriented programming language. Most the code in Python is interpreted as an object, with its properties and methods.
In Python, a Class is like an object constructor.
demo code 8: Object
class Student:
def __init__(self, name, age, major):
self.name = name
self.age = age
self.major = major
s1 = Student("Karl", 29, 'ITSD')
print(s1)
print(s1.name)
print(s1.age)
print(s1.major)
output:

built-in __init__() function:
All classes have a function called __init__(), which is always executed when the class is being initiated. Use the __init__() function to assign values to object properties, or other operations that are necessary to do when the object is being created:
Following is a video to go through all the demo codes provided, it serves as a summarize of this section, enjoy^^
**[CLARIFY] In demo 4 (11:46),in the video I said "slice notation" can be treat as substring() in Java, but actually "slice notation" also works on other data structures, such as list.
Before we introduce the Python Library, let’s try to solve a question in LeetCode with both Java and Python to test our understanding in Python:D!
I will solve question 9 - Palindrome Number as an example:
Python Library
A library is a collection of files (called modules) that contains functions for use by other programs.
It may also contain data values and other things. (It is similar to API)
The Python standard library is an extensive suite of modules that comes with Python itself.
Many additional libraries are available from PyPI (the Python Package Index).
Library is a handy tool for Python to suit with many specific tasks, such as:
Famous Library | Used In Python for: |
![]() | Machine Learning (such as neural networks) |
![]() | Work with Complex data (for cross-validation) |
![]() | Machine Learning (such as sound waves analysis) |
![]() | SciPy is a library that uses NumPy for the purpose of solving mathematical functionsfor various commonly used tasks in scientific programming. |
![]() | Pandas is a library for data analysis that provides data structures of high-level and a range of tools for analysis. Pandas have so many inbuilt methods for grouping, combining, and filtering data. It used “DataFrame”* as the data structure for data This is the library I am going to use to perform SQL in Python. * DataFrame is a 2-dimensional labeled data structure with columns of potentially different types, like a spreadsheet or SQL table. DataFrame accepts many kinds of input. |
![]() | SQLAlchemy is a library that provides the communication between Python programs and databases. SQLAlchemy provides a standard interface that allows users to connect with a wide variety of database engines. This is the library I am going to use to perform SQL in Python. |
How to use libraries in Python:
for library already installed: use ‘import’ to load a library module into a program’s memory
for library that is not installed (using PyCharm) e.g. installing panadas library:
a. hover on the uninstalled library and click on “Install package xxx”


OR
b. Open File > Settings > Project from the PyCharm menu.
Select current project.
Click the Python Interpreter tab within your project tab.
Click the small + symbol to add a new library to the project.
Search the library to be installed and click Install Package.
Wait for the installation to terminate and close all popup windows.

Perform basic SQL with Pandas and SQLAlchemy
I will connect with a database created in SEM2 – CPRG251-Object-Oriented Programming Essentials as an example. There is also a video walkthrough at the end of the section.
Required Libraries (the demo will be connecting with Maria database):
Pandas
SQLAlchemy
SQL: Basic Select Statement
demo code: Select Statement (panda_mariadb.py)
import pandas as pd
import sqlalchemy
# 1.connect to database
engine = sqlalchemy.create_engine("mariadb+mariadbconnector://root:password@localhost:3306/coffeedb")
# 2. read table and turn it into DataFrame type
table_df = pd.read_sql_table('coffee', con=engine)
# 3. Display a summary on the DataFrame object
table_df.info()
# 4. Run SQL Select statement in DataFrame format
sql_df = pd.read_sql("SELECT * FROM coffee", con=engine)
# output result for query
print(sql_df)
# 5. Create a new DataFrame with certain Column from the given DataFrame Object
df = pd.DataFrame(sql_df, columns=['Description', 'Price'])
# output result
print(df)
break down:
| Purpose | Syntax & Output |
# 1. | [SQLAlchemy] create “engine” to connect with database, so that the program can interact with the database directly | The typical form of a database URL is: dialect+driver://username:password@host:port/database |
# 2. | [Pandas] Read SQL database table into a DataFrame. | pandas.read_sql_table( <table_name>, con=<engine>) |
# 3. | [Pandas] Print a concise summary of a DataFrame. | simply use variable.info() or print(variable.info()). Noted the variable must be storing a DataFrame object. We can study the number of entries, column name, count of non-null entries and datatype. Output: information of 'coffee' table ![]() |
# 4. | [Pandas] Read SQL query or database table into a DataFrame and print the result | pandas.read_sql( <SQL statement>, con=<engine>) Output: ![]() |
# 5. | Store the DataFrame with specific column and print the result | pandas. DataFrame(sql_df, columns=[list of column names]) Output: ![]() |
SQL: Insert and Delete Statement:
demo code: insert and delete statement (panda_mariadb_insert_and_delete.py)
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine("mariadb+mariadbconnector://root:password@localhost:3306/coffeedb")
# Display original Table
sql_df = pd.read_sql("SELECT * FROM coffee", con=engine)
print(sql_df)
# 1. Create new data using DataFrame
new_data_df = pd.DataFrame({
'Description': ['TEST1', 'TEST2'],
'ProdNum': ['99-001', '99-002'],
'Price': [9.99, 8.88]})
print(new_data_df)
# 2. Insert the new data into the table
new_data_df.to_sql('coffee', con=engine, if_exists='append', chunksize=1000, index=False)
sql_df2 = pd.read_sql("SELECT * FROM coffee", con=engine)
print(sql_df2)
# 3. Delete the newly added rows
engine.execute("DELETE FROM coffee WHERE Description = 'TEST1'")
engine.execute("DELETE FROM coffee WHERE Description = 'TEST2'")
sql_df3 = pd.read_sql("SELECT * FROM coffee", con=engine)
print(sql_df3)
Breakdown:
| Purpose | Syntax & Output |
# 1. | Make a DataFrame object containing new data you wanted to insert into the table | pandas.DataFrame({‘col1’:[‘val’],’col2’:[‘val2’]}) Output: ![]() |
# 2. | Insert the new data to the end of the table | DataFrame.to_sql(name, con, if_exits, chunksize, index) Output: ![]() |
# 3. | Execute Delete directly on the database through SQLAlchemy connection | engine.execute(“SQL statment”) Ouptut: ![]() |
p.s. Noted that there are multiple ways to perform SELECT, INSERT and DELETE by pandas, I am only demonstrating one of the way.
Following is a video to go through all the demo codes related to pandas, it serves as a summarize of this section, enjoy!
Conclusion
Python is a user-friendly programming language that is easy and fun to learn. The unlimited number of libraries also make it dynamic and excel in different tasks. There is vast possibility for many usages. The way it interacts with data is far beyond displaying table. With the help of Pandas library, is it easy to perform data cleansing, data fill, data normalization and data visualization.
I hope the article could help people who are interested in learning Python but don’t know where to start or who are fascinated about data science. In the next article, I will use one of the assignment from previous course as a case study to showcase the power of pandas. Stay tuned!
Reference
J. Hartman. “Java vs Python: Key Differences Between Java and Python?”, Guru99, December 12, 2021. [Online] Available: https://www.guru99.com/java-vs-python.html [Accessed: January 18, 2022]
Mosh. “Python for Beginners - Learn Python in 1 Hour”, YouTube, September 16, 2020. [Online] Available: https://www.youtube.com/watch?v=kqtD5dpn9C8 [Accessed: January 18, 2022]
W3Schools. “Python Tutorial”, W3Schools, [Online] Available: https://www.w3schools.com/python/ [Accessed: January 25, 2022]
D. Fugier. “Python Data Types”, Rhino Developer Docs, December 5, 2018 [Online] Available: https://developer.rhino3d.com/guides/rhinopython/python-datatypes/ [Accessed: January 25, 2022]
B. Weber. “Defining Main Functions in Python”, Real Python [Online] Available: https://realpython.com/python-main-function/
[Accessed: January 25, 2022]
T. Mester. “Python Built-in Functions and Methods (Python for Data Science Basics #3)”, Data36, October 25, 2017 [Online] Available: https://data36.com/python-built-in-functions-methods-python-data-science-basics-3/
[Accessed: January 30, 2022]
Programiz. “Python Operators”, Programiz, October 25, 2017 [Online] Available: https://www.programiz.com/python-programming/operators[Accessed: January 30, 2022]
Python. “9. Classes”, Python[Online] Available: https://docs.python.org/3/tutorial/classes.html[Accessed: February 1, 2022]
codebasics. “Python Pandas Tutorial 14: Read Write Data From Database (read_sql, to_sql)”, YouTube, May 6, 2018. [Online] Available: https://www.youtube.com/watch?v=M-4EpNdlSuY[Accessed: February 1, 2022]
Pandas. “User Guide”, Pandas. [Online] Available:https://pandas.pydata.org/docs/user_guide/index.html#user-guide[Accessed: February 8, 2022]
R. Hedgpeth. “Using SQLAlchemy with MariaDB Connector/Python: Part 1”, MariaDB, March 23, 2021. [Online] Available:https://mariadb.com/resources/blog/using-sqlalchemy-with-mariadb-connector-python-part-1/[February 15, 2022]
SQLAlchemy. “SQLAlchemy 1.4 Documentation”, SQLAlchemy. [Online] Available: https://docs.sqlalchemy.org[February 15, 2022]













Comments