mkalioby/leopards: Query your python lists

2024-11-15 01:18:00
github.com

Leopards is a way to query list of dictionaries or objects as if you are filtering in DBMS.
You can get dicts/objects that are matched by OR, AND or NOT or all of them.
As you can see in the comparison they are much faster than Pandas.

from leopards import Q
l = [{"name":"John","age":"16"}, {"name":"Mike","age":"19"},{"name":"Sarah","age":"21"}]
filtered= Q(l,{'name__contains':"k", "age__lt":20})
print(list(filtered))

output

[{'name': 'Mike', 'age': '19'}]

The above filtration can be written as

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, name__contains="k", age__lt=20)

Notes:

Q returns an iterator which can be converted to a list by calling list.
Even though, age was str in the dict, as the value of in the query dict was int, Leopards converted the value in dict automatically to match the query data type. This behaviour can be stopped by passing False to convert_types parameter.

eq: equals and this default filter
gt: greater than.
gte: greater than or equal.
lt: less than
lte: less than or equal
in: the value in a list of a tuple.
contains: contains a substring as in the example.
icontains: case-insensitive contains.
startswith: checks if a value starts with a query strings.
istartswith: case-insensitive startswith.
endswith: checks if a value ends with a query strings.
iendswith: case-insensitive endswith.
isnull: checks if the value matches any of NULL_VALUES which are ('', '.', None, "None", "null", "NULL")
- e.g. filter__isnull=True or filter__isnull=False

For eq,gt,gte,lt,lte, in, contains, icontains, startswith,istartswith, endswith and iendswith, you can add a n to negate the results. e.g nin which is equivalent to not in

This section will cover the use of OR, AND and NOT

OR or __or__ takes a list of dictionaries to evaluate and returns with the first True.

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"OR": [{"name__contains": "k"}, {"age__gte": 21}]})
print(list(filtered))

output

[{'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]

NOT or __not__ takes a dict for query run.

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"age__gt": 15, "NOT": {"age__eq": 19}})
print(list(filtered))

output

[{'name': 'John', 'age': '16'}, {'name': 'Sarah', 'age': '21'}]

AND or __and__ takes a list of dict for query run, returns with the first False.

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"__and__": [{"age__gte": 15}, {"age__lt": 21}]})
print(list(filtered))

output

[{'name': 'John', 'age': '16'}, {'name': 'Mike', 'age': '19'}]

You can run the following aggregations

Find the count of certain aggregated column

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Count
count = Count(l,['age'])

output

[{"age":"16","count":1},{"age":"19","count":2}, {"age":"21","count":1}]

Find the Max value for a certain column in certain aggregated columns

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"Joh","age":"19"}]
from leopards import Max
count = Max(l,"age",['name'],dtype=int)

output

[{'name': 'John', 'age': '19'}, {'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]

Notes:

If you don’t pass the aggregation columns, the maximum will be found across dataset.
You can pass the datatype of the column to convert it on the fly while evaluating

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"Joh","age":"19"}]
from leopards import Max
m = Max(l,"age",dtype=int)

output

Find the Max value for a certain column in certain aggregated columns

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"Joh","age":"19"}]
from leopards import Min
m = Min(l,"age",['name'])

output

[{'name': 'John', 'age': '16'}, {'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]

Note:

If you don’t pass the aggregation columns, the min will be found across dataset.
You can pass the datatype of the column to convert it on the fly while evaluating

Like Min and Max but only works with integers and floats.

This is done on Python 3.8 running on Ubuntu 22.04 on i7 11th generation and 32 GB of RAM.

Comparison	Pandas	Leopards
Package Size (Lower is better)	29.8 MB	7.5 KB
import Time (Worst) (Lower is better)	146 ms	1.05 ms
load 10k CSV lines (Lower is better) ^[1]	0.295s	0.138s
get first matched record (Lower is better)	0.310s	0.017s
print all filtered records (10/10k) (Lower is better)	0.310s	0.137s
filter by integers (Lower is better)	0.316s	0.138s

^[1] This was loading the whole csv in memory which was for sake of fair comparison.
Nevertheless, Leopards can work with DictReader as an iterable which executes in 0.014s, then it handles line by line.

Thanks for Asma Tahir for Pandas stats.

Source Link

Support Techcratic

If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.

Bitcoin Address:

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Please verify this address before sending funds.

Bitcoin QR Code

Simply scan the QR code below to support Techcratic.

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: HACKER NEWS

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

mkalioby/leopards: Query your python lists

Support Techcratic

Bitcoin QR Code

EKR Custom Fit Model X Car Seat Covers for Select Tesla Model X 2020 2021 2022 2023…

Russell Westbrook posts 200th career triple-double in win over Grizzlies

Related Posts

Leave a Reply Cancel reply

Tech News

Tech News

Tech News

Tech News​

Site Links

Tech News