Answers:
urllib
和urlparse
模块有几个怪癖。这是一个工作示例:
try:
import urlparse
from urllib import urlencode
except: # For Python 3
import urllib.parse as urlparse
from urllib.parse import urlencode
url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}
url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)
url_parts[4] = urlencode(query)
print(urlparse.urlunparse(url_parts))
ParseResult
,结果urlparse()
,是只读的,我们需要把它转换成list
之前,我们可以尝试修改其数据。
urlencode
as urllib.urlencode(query, doseq=True)
。否则,原始URL中存在的参数将无法正确保留(因为它们是从@ parse_qs @
urlparse()
和urlsplit()
实际上namedtuple
的情况。因此,您可以将它们直接分配给变量并用于url_parts = url_parts._replace(query = …)
更新它。
我对本页上的所有解决方案都不满意(请问,我们最喜欢的复制粘贴内容在哪里?),所以我根据此处的答案写了自己的解决方案。它试图变得完整和更加Pythonic。我为参数中的dict和bool值添加了一个处理程序,以使其对消费者端(JS)更友好,但是它们仍然是可选的,您可以将其删除。
测试1:添加新参数,处理数组和布尔值:
url = 'http://stackoverflow.com/test'
new_params = {'answers': False, 'data': ['some','values']}
add_url_params(url, new_params) == \
'http://stackoverflow.com/test?data=some&data=values&answers=false'
测试2:重写现有的参数,处理DICT值:
url = 'http://stackoverflow.com/test/?question=false'
new_params = {'question': {'__X__':'__Y__'}}
add_url_params(url, new_params) == \
'http://stackoverflow.com/test/?question=%7B%22__X__%22%3A+%22__Y__%22%7D'
代码本身。我试图详细描述它:
from json import dumps
try:
from urllib import urlencode, unquote
from urlparse import urlparse, parse_qsl, ParseResult
except ImportError:
# Python 3 fallback
from urllib.parse import (
urlencode, unquote, urlparse, parse_qsl, ParseResult
)
def add_url_params(url, params):
""" Add GET params to provided URL being aware of existing.
:param url: string of target URL
:param params: dict containing requested params to be added
:return: string with updated URL
>> url = 'http://stackoverflow.com/test?answers=true'
>> new_params = {'answers': False, 'data': ['some','values']}
>> add_url_params(url, new_params)
'http://stackoverflow.com/test?data=some&data=values&answers=false'
"""
# Unquoting URL first so we don't loose existing args
url = unquote(url)
# Extracting url info
parsed_url = urlparse(url)
# Extracting URL arguments from parsed URL
get_args = parsed_url.query
# Converting URL arguments to dict
parsed_get_args = dict(parse_qsl(get_args))
# Merging URL arguments dict with new params
parsed_get_args.update(params)
# Bool and Dict values should be converted to json-friendly values
# you may throw this part away if you don't like it :)
parsed_get_args.update(
{k: dumps(v) for k, v in parsed_get_args.items()
if isinstance(v, (bool, dict))}
)
# Converting URL argument to proper query string
encoded_get_args = urlencode(parsed_get_args, doseq=True)
# Creating new parsed result object based on provided with new
# URL arguments. Same thing happens inside of urlparse.
new_url = ParseResult(
parsed_url.scheme, parsed_url.netloc, parsed_url.path,
parsed_url.params, encoded_get_args, parsed_url.fragment
).geturl()
return new_url
请注意,可能会有一些问题,如果您发现一个问题,请告诉我,我们会做的更好
http://stackoverflow.com/with%2Fencoded?data=some&data=values&answe%2rs=false
。另外,使用三个V形>>>
符号帮助doctest获取您的doctest
parsed_get_args = dict(parse_qsl(get_args))
为parsed_get_args = parse_qs(get_args)
如果字符串可以具有任意数据(例如,需要对与号,斜线等字符进行编码),则要使用URL编码。
查看urllib.urlencode:
>>> import urllib
>>> urllib.urlencode({'lang':'en','tag':'python'})
'lang=en&tag=python'
在python3中:
from urllib import parse
parse.urlencode({'lang':'en','tag':'python'})
您还可以使用furl模块https://github.com/gruns/furl
>>> from furl import furl
>>> print furl('http://example.com/search?q=question').add({'lang':'en','tag':'python'}).url
http://example.com/search?q=question&lang=en&tag=python
如果您使用请求lib:
import requests
...
params = {'tag': 'python'}
requests.get(url, params=params)
是的:使用urllib。
从文档中的示例中:
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.geturl() # Prints the final URL with parameters.
>>> print f.read() # Prints the contents
基于这个答案,简单案例的一线式(Python 3代码):
from urllib.parse import urlparse, urlencode
url = "https://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}
url += ('&' if urlparse(url).query else '?') + urlencode(params)
要么:
url += ('&', '?')[urlparse(url).query == ''] + urlencode(params)
?
锚点(#?stuff
)中的,它将无法正常工作。
我发现这比两个最重要的答案更为优雅:
from urllib.parse import urlencode, urlparse, parse_qs
def merge_url_query_params(url: str, additional_params: dict) -> str:
url_components = urlparse(url)
original_params = parse_qs(url_components.query)
# Before Python 3.5 you could update original_params with
# additional_params, but here all the variables are immutable.
merged_params = {**original_params, **additional_params}
updated_query = urlencode(merged_params, doseq=True)
# _replace() is how you can create a new NamedTuple with a changed field
return url_components._replace(query=updated_query).geturl()
assert merge_url_query_params(
'http://example.com/search?q=question',
{'lang':'en','tag':'python'},
) == 'http://example.com/search?q=question&lang=en&tag=python'
我在最重要的答案中不喜欢的最重要的事情(尽管如此,它们还是不错的):
query
URL组件中的索引ParseResult
我的响应不好的是dict
使用了拆包的神奇合并,但是由于我对可变性的偏见,我更喜欢更新现有字典。
我喜欢Łukasz版本,但是由于在这种情况下使用urllib和urllparse函数有些尴尬,因此我认为执行以下操作更简单:
params = urllib.urlencode(params)
if urlparse.urlparse(url)[4]:
print url + '&' + params
else:
print url + '?' + params
使用各种urlparse
功能urllib.urlencode()
将组合字典上的现有URL拆开,然后urlparse.urlunparse()
将其重新组合在一起。
或只取结果urllib.urlencode()
并将其适当地连接到URL。
还有一个答案:
def addGetParameters(url, newParams):
(scheme, netloc, path, params, query, fragment) = urlparse.urlparse(url)
queryList = urlparse.parse_qsl(query, keep_blank_values=True)
for key in newParams:
queryList.append((key, newParams[key]))
return urlparse.urlunparse((scheme, netloc, path, params, urllib.urlencode(queryList), fragment))
这是我的实现方法。
import urllib
params = urllib.urlencode({'lang':'en','tag':'python'})
url = ''
if request.GET:
url = request.url + '&' + params
else:
url = request.url + '?' + params
像魅力一样工作。但是,我希望有一种更清洁的方法来实现此目的。
实现上述内容的另一种方法是将其放入方法中。
import urllib
def add_url_param(request, **params):
new_url = ''
_params = dict(**params)
_params = urllib.urlencode(_params)
if _params:
if request.GET:
new_url = request.url + '&' + _params
else:
new_url = request.url + '?' + _params
else:
new_url = request.url
return new_ur
在python 2.5中
import cgi
import urllib
import urlparse
def add_url_param(url, **params):
n=3
parts = list(urlparse.urlsplit(url))
d = dict(cgi.parse_qsl(parts[n])) # use cgi.parse_qs for list values
d.update(params)
parts[n]=urllib.urlencode(d)
return urlparse.urlunsplit(parts)
url = "http://stackoverflow.com/search?q=question"
add_url_param(url, lang='en') == "http://stackoverflow.com/search?q=question&lang=en"
urlparse.parse_qs
而不是parse_qsl
。后者返回一个列表,而您需要一个字典。请参阅docs.python.org/library/urlparse.html#urlparse.parse_qs。