Methods for Filtering your Data in Pandas

Anurag Kuche
4 min readMay 28, 2020

Different ways we could filter our data frames in Pandas

There are mainly 3 methods that can be used to filter data from the dataframe.

1. loc: This method is used to filter the data when you have labels or text as an index (can be a row index or column index)

2. Iloc: This method is used to filter data when you only have numbers as index values (can we row or column values)

3. ix: This method gives the flexibility of filtering the data either with integer indexes or label values like text. But df.ix methods are deprecated as they are more confusing.

Importing the necessary data
Figure 1: Importing data

DF.LOC()

df.loc is a method that can be used to filter required rows and columns which satisfy the specified condition or label. Here labels could mean indexes or column names.

The general format of the loc method: Df.loc[ row index, column names ]

Figure 2: loc filtering for a single value

iloc method can be used to filter data and when we want a certain entry of a row and a column we would pass both row name and column name respectively and get back the intersection of both of them that is a single value.

# A series abject is returned if we pass a complete list name to loc and iloc

Figure 3: loc filtering for Series

The general format of the loc method: Df.loc[[row indexes],[column names]]

# A dataframe abject. is returned if we pass a series of lists to loc and iloc

Figure 4: loc filtering for Dataframe-I

#square brackets should be used we are using many row or column names

Figure 5: loc filtering for dataframes-list

DF.ILOC()

Df.iloc: this method is similar to loc but the ‘i’ here stands for an integer which means this can filter the data only by numbers and not by text like you have seen in the above example.

In loc, we use the names of the columns like type and gas but in iloc, we can only use the index value of the integers.

# A Dataframe series is returned if we pass a series of list names in loc and iloc

Figure 6: iloc filtering for value

# A series object is returned if we pass a series of list names in loc and iloc

Figure 7: iloc filtering for series
Figure 8: iloc filtering for dataframe- List
Figure 9: iloc filtering for dataframe -Slicing

The difference in loc and iloc methods is illustrated below that that is the last entry in slicing is not included in iloc but in loc, it is included.

Figure 9: iloc method (Comparision with loc)

In the above iloc method column Index value life expectancy and 5 rows are not included for the same line of code.

Figure 8: the difference between loc and iloc

Tip to Remember: iloc slicing is exclusive of the last element in the range, unlike loc which is inclusive of the last element in the range similar to python list slicing.

References:

  1. pandas indexing and filtering: https://pandas.pydata.org/docs/user_guide/indexing.html#basics

--

--