在Python中将以分号分隔的字符串拆分为字典


84

我有一个看起来像这样的字符串:

"Name1=Value1;Name2=Value2;Name3=Value3"

Python中是否有内置类/函数将采用该字符串并构造一个字典,就像我已经做过的那样:

dict = {
    "Name1": "Value1",
    "Name2": "Value2",
    "Name3": "Value3"
}

I have looked through the modules available but can't seem to find anything that matches.


Thanks, I do know how to make the relevant code myself, but since such smallish solutions are usually mine-fields waiting to happen (ie. someone writes: Name1='Value1=2';) etc. then I usually prefer some pre-tested function.

I'll do it myself then.


does your question require to support s = r'Name1='Value=2';Name2=Value2;Name3=Value3;Name4="Va\"lue;\n3"' input (note: a semicolon inside a quoted string, a quote is escaped using a backslash, \n escape is used, both single and double quotes are used)?
jfs

This question of mine is over 6 years old, the code which involved this has long since been replaced :) And no, it didn't require support for quotes. I just wanted to have a prebuilt function instead of writing something myself. However, the code is long gone.
Lasse V. Karlsen

Answers:


143

There's no builtin, but you can accomplish this fairly simply with a generator comprehension:

s= "Name1=Value1;Name2=Value2;Name3=Value3"
dict(item.split("=") for item in s.split(";"))

[Edit] From your update you indicate you may need to handle quoting. This does complicate things, depending on what the exact format you are looking for is (what quote chars are accepted, what escape chars etc). You may want to look at the csv module to see if it can cover your format. Here's an example: (Note that the API is a little clunky for this example, as CSV is designed to iterate through a sequence of records, hence the .next() calls I'm making to just look at the first line. Adjust to suit your needs):

>>> s = "Name1='Value=2';Name2=Value2;Name3=Value3"

>>> dict(csv.reader([item], delimiter='=', quotechar="'").next() 
         for item in csv.reader([s], delimiter=';', quotechar="'").next())

{'Name2': 'Value2', 'Name3': 'Value3', 'Name1': 'Value1=2'}

Depending on the exact structure of your format, you may need to write your own simple parser however.


the code doesn't handle quoting, try: s = "Name1='Value;2';Name2=Value2;Name3=Value3" (note: semicolon in the quoted Name1 value).
jfs

1
I have no idea why the second example throws AttributeError: '_csv.reader' object has no attribute 'next' for me. Of course I did import csv.
Youngjae

@Brian Is there any way to store the values as integer rather than string?
ChasedByDeath

6

This comes close to doing what you wanted:

>>> import urlparse
>>> urlparse.parse_qs("Name1=Value1;Name2=Value2;Name3=Value3")
{'Name2': ['Value2'], 'Name3': ['Value3'], 'Name1': ['Value1']}

2
it breaks if there is & or % in the input.
jfs

@jfs but the string does not contain either of those.
Vishal Singh

@VishalSingh: most visitors on StackOverflow are from google and therefore answers here are not only for the original poster who asked the question. If I came here looking for how to parse a "semicolon-separated string to a dictionary, in Python" then my strings might contain & or % -- at the very least, it is worth mentioning that the answer doesn't work for such strings.
jfs

3
s1 = "Name1=Value1;Name2=Value2;Name3=Value3"

dict(map(lambda x: x.split('='), s1.split(';')))

1

It can be simply done by string join and list comprehension

",".join(["%s=%s" % x for x in d.items()])

>>d = {'a':1, 'b':2}
>>','.join(['%s=%s'%x for x in d.items()])
>>'a=1,b=2'

-2
easytiger $ cat test.out test.py | sed 's/^/    /'
p_easytiger_quoting:1.84563302994
{'Name2': 'Value2', 'Name3': 'Value3', 'Name1': 'Value1'}
p_brian:2.30507516861
{'Name2': 'Value2', 'Name3': "'Value3'", 'Name1': 'Value1'}
p_kyle:7.22536420822
{'Name2': ['Value2'], 'Name3': ["'Value3'"], 'Name1': ['Value1']}
import timeit
import urlparse

s = "Name1=Value1;Name2=Value2;Name3='Value3'"

def p_easytiger_quoting(s):
    d = {}
    s = s.replace("'", "")
    for x in s.split(';'):
        k, v = x.split('=')
        d[k] = v
    return d


def p_brian(s):
    return dict(item.split("=") for item in s.split(";"))

def p_kyle(s):
    return urlparse.parse_qs(s)



print "p_easytiger_quoting:" + str(timeit.timeit(lambda: p_easytiger_quoting(s)))
print p_easytiger_quoting(s)


print "p_brian:" + str(timeit.timeit(lambda: p_brian(s)))
print p_brian(s)

print "p_kyle:" + str(timeit.timeit(lambda: p_kyle(s)))
print p_kyle(s)

This doesn't answer the question, because it doesn't handle quoting. Try s = "Name1='Value1=2';Name2=Value2" and csv` (as in Brian's accepted answer) or parse_qs (as in Kyle's) will get it right, while yours will raise a ValueError. The OP specifically says "such smallish solutions are usually mine-fields waiting to happen", which is why he wants a built-in or other well tested solution, and he gives an example that will break your code.
abarnert

Ahh i didn't see that. still. it would still be faster than all your solutions to preparse those in the main string before the iteration takes place and recalling the replace function thousands of times. I will update
easytiger

I'm not sure how you're going to preparse it. But even if you do, this seems like exactly what the OP was afraid of in a simple solution. Are you sure there are no other mines ahead? Can you prove it to the OP's satisfaction?
abarnert

OK, now that I've seen your edit… First, s.replace doesn't do anything at all; it just returns a new string that you ignore. Second, even if you got it right (s = s.replace…), that doesn't fix the problem, it just adds a new one on top of it. Try it on either my example or the OP's.
abarnert

The specification clearly includes handling the sample input he mentioned in his question, Name='Value1=2';. And your code doesn't handle it. And I'm not sure how you'd sanitize that without parsing it in some way that will be just as slow as urlparse or csv in the first place.
abarnert

-2

IF your Value1, Value2 are just placeholders for actual values, you can also use the dict() function in combination with eval().

>>> s= "Name1=1;Name2=2;Name3='string'"
>>> print eval('dict('+s.replace(';',',')+')')
{'Name2: 2, 'Name3': 'string', 'Name1': 1}

This is beacuse the dict() function understand the syntax dict(Name1=1, Name2=2,Name3='string'). Spaces in the string (e.g. after each semicolon) are ignored. But note the string values do require quoting.


Thanks, upvote string.replace worked well. Don't know why I couldn't split. I did i = textcontrol.GetValue() on tc box, then o = i.split(';') but didn't output a string just complained about format, unlike replace.
Iancovici

1
s.replace(';'-based solution breaks if there is ; inside a quoted value. eval is evil and it is unnecessary in this case.
jfs
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.