I have pandas dataframe
containing network traffic of multiple hosts.
df = pd.read_csv ( "traffic.csv",skipinitialspace=True, usecols=['frame.time_epoch','ip.src','ip.dst','tcp.srcport','tcp.dstport','frame.len','tcp.flags','Protocol'],na_filter=False,encoding="utf-8" ) complete = pd.read_csv ( "traffic.csv",skipinitialspace=True, usecols=['frame.time_epoch','ip.src','ip.dst','tcp.srcport','tcp.dstport','frame.len','tcp.flags','Protocol'],na_filter=False,encoding="utf-8" )
I would like to group traffic flow, which 'frame.len'
sign show the direction of packets. To do so, for each host first I set the sign of the ‘frame.len’ by comparing ‘ip.dst’ with host ip:
complete.loc[(complete['ip.dst'] == hostip[i]) ,'frame.len'] = complete['frame.len'] * -1
then I replace the 'ip.src'
and 'tcp.srcport'
with 'ip.dst'
and ‘tcp.dstport’ and vis verca for incoming packets whose 'frame.len'
get negative value.
complete.loc[(complete['frame.len'] < 0),'ip.src'] = df['ip.dst'] complete.loc[(complete['frame.len'] < 0),'ip.dst'] = df['ip.src'] complete.loc[(complete['frame.len'] < 0),'tcp.srcport'] = df['tcp.dstport'] complete.loc[(complete['frame.len'] < 0),'tcp.dstport'] = df['tcp.srcport']
and then I group each flow by following criteria:
complete_flow = complete.groupby(['ip.src','ip.dst','tcp.srcport','tcp.dstport','Protocol'])
Is there a simpler way using pandas dataframe
features?